Like Pulling Teeth? Matching the Right Research Questions with the Right Methods
Conducting rigorous policy research is a painstaking process. I’ve even heard it compared to pulling teeth—and there’s some truth to that. After all, designing and executing a high quality study can be arduous, but if all goes according to plan, you’ll have a result that helps improve well-being.
I hadn’t thought about any other connections between good dental care and quality policy research until several months ago, when a report from the Associated Press called into question the benefits of flossing. The story went viral because it challenged the conventional wisdom that flossing and brushing are better than brushing alone. And, from my perspective at Mathematica, it made me think about a critical issue that researchers always grapple with—what are the best ways to generate reliable, high quality evidence?
Although there are many ways to answer that question, there are two critical steps that researchers must get right:
- Define the question you are trying to answer
- Apply the right tools and approach to find the answer
Beyond the Gold Standard
In the case of the Associated Press report, the focus was primarily on the approach. The report cited 25 studies that showed weak or unreliable evidence of flossing’s benefits. As Jamie Holmes, a fellow at New America, wrote in The New York Times, this conclusion was driven by “the absence of support in the form of definitive randomized controlled trials, the so-called gold standard for scientific research.”
Randomized controlled trials (RCTs) are indeed a bedrock of rigorous research. Mathematica’s nearly 50-year body of work features numerous large-scale RCTs that have generated important insights and influenced decision making across the public policy spectrum. But such studies are not the only option for producing high quality, impactful findings.
In the case of flossing, why is there no “gold standard” evidence about its benefits? Holmes notes that long-term RCTs are difficult to implement, and in this case such an approach would face ethical challenges as well. Instead (as Holmes points out), based on “a range of evidence, including clinical experience,” dentists have concluded “that flossing, properly done, works.”
Now, I’m not here to argue for either side of the Great Flossing Debate—perhaps clinicians’ experiences are enough to justify the use of floss, and perhaps more rigorous evaluation is necessary to determine its utility. But I believe that expert opinion and scientific research might not be in conflict on this point—they just address different questions.
Yes, we want to know whether flossing is better than not flossing. But the phrase “properly done” above is critical, as it reflects how the question of interest can vary in subtle ways and can align with different research methods. When dentists assess the benefits of flossing for oral health, they might frame the question in terms of whether a given patient flossing according to their guidelines can generate improved oral outcomes. But researchers might frame the question differently, to look at the effects of how people actually floss, and this question might be addressed most effectively through RCTs or similar rigorous analytical methods applied to data. Hence, one can imagine that one patient might improve his or her oral health by flossing properly, but the broad use of flossing, which might not always be properly done, could have limited effectiveness.
Addressing Key Policy Questions
The need to ask the right question from the outset is also critical in the policy world. Take education policy, for example. When I led the first national evaluation of the impact of Teach For America (TFA) teachers on student learning, we sought to answer the question of whether TFA had a positive impact on the students it served. We conducted a random assignment study comparing the performance of TFA corps members with that of other teachers in the same elementary schools. We found that students of TFA teachers outscored their schoolmates on math achievement tests and matched their average performance in reading. Hence, we concluded that TFA had a positive effect on the students in the schools that it served at the time.
Many teaching experts saw TFA as harmful because the teachers had not gone through the training that most experts agreed generated good teachers. Similar to the case with the dentists and flossing, some critics of the TFA findings were upset that our study did not align with their own framing and address the question of whether the program creates good teachers. That is certainly a good question and it might actually sound the same as the question we investigated—but it’s not. To isolate the impact of the program on students, we simply wanted to measure the relative effect of TFA teachers compared with what would have happened in their absence, as represented by the non-TFA teachers.
These examples do not minimize the validity of the perspectives of experts in the field. But more often than not, answers to tough policy questions are affected by a myriad of contextual factors that can only be sorted out through objective analysis of relevant data. Thus, policy researchers can’t determine which approach to use—be it Bayesian methods, adaptive randomization, rapid-cycle evaluation, or predictive analytics—until we clearly define the research question we are trying to answer. It’s important that we explore all the tools at our disposal to match the right methods with the right questions—even if it sometimes feels like pulling teeth.