Some days, the internet gods are kind. On April 9th, I wrote,
We want talking about educational efficacy to be like talking about the efficacy of Advil for treating arthritis. But it’s closer to talking about the efficacy of various chemotherapy drugs for treating a particular cancer. And we’re really really bad at talking about that kind of efficacy. I think we have our work cut out for us if we really want to be able to talk intelligently and intelligibly about the effectiveness of any particular educational intervention.
On the very same day, the estimable Larry Cuban blogged,
So it is hardly surprising, then, that many others, including myself, have been skeptical of the popular idea that evidence-based policymaking and evidence-based instruction can drive teaching practice. Those doubts have grown larger when one notes what has occurred in clinical medicine with its frequent U-turns in evidence-based “best practices.” Consider, for example, how new studies have often reversed prior “evidence-based” medical procedures. *Hormone therapy for post-menopausal women to reduce heart attacks wasfound to be more harmful than no intervention at all. *Getting a PSA test to determine whether the prostate gland showed signs of cancer for men over the age of 50 was “best practice” until 2012 when advisory panels of doctors recommended that no one under 55 should be tested and those older might be tested if they had family histories of prostate cancer. And then there are new studies that recommend women to have annual mammograms, not at age 50 as recommended for decades, but at age 40. Or research syntheses (sometimes called “meta-analyses”) that showed anti-depressant pills worked no better than placebos. These large studies done with randomized clinical trials–the current gold standard for producing evidence-based medical practice–have, over time, produced reversals in practice. Such turnarounds, when popularized in the press (although media attention does not mean that practitioners actually change what they do with patients) often diminished faith in medical research leaving most of us–and I include myself–stuck as to which healthy practices we should continue and which we should drop. Should I, for example, eat butter or margarine to prevent a heart attack? In the 1980s, the answer was: Don’t eat butter, cheese, beef, and similar high-saturated fat products. Yet a recent meta-analysis of those and subsequent studies reached an opposite conclusion. Figuring out what to do is hard because I, as a researcher, teacher, and person who wants to maintain good health has to sort out what studies say and how those studies were done from what the media report, and then how all of that applies to me. Should I take a PSA test? Should I switch from margarine to butter?
He put it much better than I did. While the gains in overall modern medicine have been amazing, anybody who has had even a moderately complex health issue (like back pain, for example) has had the frustrating experience of having a billion tests, being passed from specialist to specialist, and getting no clear answers.1 More on this point later. Larry’s next post—actually a guest post by Francis Schrag—is an imaginary argument between an evidence-based education proponent and a skeptic. I won’t quote it here, but it is well worth reading in full. My own position is somewhere between the proponent and the skeptic, though leaning more in the direction of the proponent. I don’t think we can measure everything that’s important about education, and it’s very clear that pretending that we can has caused serious damage to our educational system. But that doesn’t mean I think we should abandon all attempts to formulate a science of education. For me, it’s all about literacy. I want to give teachers and students skills to interpret the evidence for themselves and then empower them to use their own judgment. To that end, let’s look at the other half of Larry’s April 9 post, the title of which is “What’s The Evidence on School Devices and Software Improving Student Learning?”
Lies, Damned Lies, and…
The heart of the post is a study by John Hattie, a Professor at the University of Auckland (NZ). He’s done meta-analysis on an enormous number of education studies, looking at effect sizes, measured on a scale from 0.1, which is negligible, to 1.0, which is a full standard deviation.
He found that the “typical” effect size of an innovation was 0.4. To compare different classroom approaches shaped student learning, Hattie used the “typical” effect size (0.4) to mean that a practice reached the threshold of influence on student learning (p. 5). From his meta-analyses, he then found that class size had a .20 effect (slide 15) while direct instruction had a .59 effect (slide 21). Again and again, he found that teacher feedback had an effect size of .72 (slide 32). Moreover, teacher-directed strategies of increasing student verbalization (.67) and teaching meta-cognition strategies (.67) had substantial effects (slide 32). What about student use of computers (p. 7)? Hattie included many “effect sizes” of computer use from distance education (.09), multimedia methods (.15), programmed instruction (.24), and computer-assisted instruction (.37). Except for “hypermedia instruction” (.41), all fell below the “typical ” effect size (.40) of innovations improving student learning (slides 14-18). Across all studies of computers, then, Hattie found an overall effect size of .31 (p. 4).
The conclusion is that changing a classroom practice can often produce a significant effect size while adding a technology rarely does. But as my father likes to say, if you stick your head in the oven and your feet in the freezer, on average you’ll be comfortable. Let’s think about introducing clickers to a classroom, for example. What class are you using them in? How often do you use them? When do you use them? What do you use them for? Clickers in and of themselves change nothing. No intervention is going to be educationally effective unless it gets students to perceive, act, and think differently. There are lots of ways to use clickers in the classroom that have no such effect. My guess is that, most of the time, they are used for formative assessments. Those can be helpful or not, but generally when done in this way are more about informing the teacher than they are directly about helping the student. But there are other uses of clicker technologies. For example, University of Michigan professor Perry Samson recently blogged about using clickers to compare students’ sense of their physical and emotional well-being with their test performance:
I have observed over the last few years that a majority of the students who were withdrawing from my course in mid-semester commented on a crisis in health or emotion in their lives. On a lark this semester I created an image-based question to ask students in LectureTools at the beginning of each class (example, Figure 2) that requested their self assessment of their current physical and emotional state. Clearly there is a wide variation in students’ perceptions of their physical and emotional state. To analyze these data I performed cluster analysis on students’ reported emotional state prior to the first exam and found that temporal trends in this measure of emotional state could be clustered into six categories.
Perhaps not surprisingly Figure 3 shows that student outcomes on the first exam were very much related to the students’ self assessment of their emotional state prior to the exam. This result is hard evidence for the intuitive, that students perform better when they are in a better emotional state.
I don’t know what Perry will end up doing with this information in terms of a classroom intervention. Nor do I know whether any such intervention will be effective. But it seems common sense not to lump it in with a million billion professors asking quiz questions on their clickers to aggregate it into an average of how effective clickers are. To be fair, that’s not Larry’s point for quoting the Hattie study. He’s arguing against the reductionist argument that technology fixes everything—an argument which seems obviously absurd to everybody except, sadly, the people who seem to have the power to make decisions. But my point is that it is equally absurd to use this study as evidence that technology is generally not helpful. What I think it suggests is that it makes little sense to study the efficacy of educational technologies or products outside the context of the efficacy of the practices that they enable. More importantly, it’s a good example of how we all need to get much more sophisticated about reading the studies so we can judge for ourselves what they do and do not prove.
Of Back Mice and Men
I have had moderate to severe back pain for the past seven years. I have been to see orthopedists, pain specialists, rheumatologists, urologists, chiropractors, physical therapists, acupuncturists, and massage therapists. In many cases, I have seen more than one in any given category. I had X-rays, CAT scans, MRIs, and electrical probes inserted into my abdomen and legs. I had many needles of widely varying gauges stuck in me, grown humans walking on my back, gallons of steroids injected into me. I had the protective sheathes of my nerves fried with electricity. If you’ve ever had chronic pain, you know that you would probably go to a voodoo priest and drink goat urine if you thought it might help. (Sadly, there are apparently no voodoo priests in my area of Massachusetts—or at least none who have a web page.) Nobody I went to could help me. Not too long ago, I had cause to visit my primary care physician, who is a good old country doctor. No specialist certificates, no Ivy League medical school degrees. Just a solid GP with some horse sense. In a state of despair, I explained my situation to him. He said, “Can I try something? Does it hurt when I touch you here?” OUCH!!!! It turns out that I have a condition called “back mice,” also called “episacral lipomas” when it is referred to in the medical literature, which, it turns out, happens rarely. I won’t go into the details of what they are, because that’s not important to the story. What’s important is what the doctor said next. “There’s hardly anything on them in the literature,” he said. “The thing is, they don’t show up on any scans. They’re impossible to diagnose unless you actually touch the patient’s back.” I thought back to all the specialists I had seen over the years. None of the doctors ever once touched my back. Not one. My massage therapist actually found the back mice, but she didn’t know what they were, and neither of us knew that they were significant. It turns out that once my GP discovered that these things exist, he started finding them everywhere. He told me a story of an eighty-year-old woman who had been hospitalized for “non-specific back pain.” They doped her up with opiates and the poor thing couldn’t stand up without falling over. He gave her a couple of shots in the right place, and a week later she was fine. He has changed my life as well. I am not yet all better—we just started treatment two weeks ago—but I am already dramatically better. The thing is, my doctor is an empiricist. In fact, he is one of the best diagnosticians I know. (And I have now met many.) He knew about back mice in the first place because he reads the literature avidly. But believing in the value of evidence and research is not the same thing as believing that only that which has been tested, measured, and statistically verified has value. Evidence should be a tool in the service of judgment, not a substitute for it. Isn’t that what we try to teach our students?
- But I’m not bitter. [↩]
Matthew Greenfield (@mattgreenfield) says
Another wonderful, nuanced essay. As an investor, I look for companies whose technology or services guide teachers and learners toward what research has already shown to be efficacious. One of my portfolio companies, Education Elements, has case studies showing some dramatic improvements in educational outcomes for users of their consulting services and software, which help schools and districts transition to new blended learning instructional models. EE’s instructional models rely on evidence like that cited by Hattie for the value of additional one-on-one tutoring. Every one of the best practices cited by Hattie can be facilitated and encouraged by technology, including fostering meta-cognition (which can be encouraged by including e-portfolios in an instructional model, e.g.). But this only happens when a tech company treats best practices research as core to its mission.
mikecaulfield says
This is a great essay. It’s kind of trendy now to say “data-informed” instead of “data-driven”, but the underlying point is sound. There’s a delicate dance between science and art. And science can never be effective without the art.
Larry’s point about epidemiological studies is valid as well. It’s important to remember that one of the big reasons epidemiological studies “fail” is they are designed to fitler out specific circumstances in search of general truths. So it’s not really failure that keeps them bopping back and forth, it’s usually zeroing in on nuance. Is butter or margarine better? Well, for a person who uses a hydrogenated form of margarine, and substitutes twice the amoun of margarine for butter, yeah, margarine might be worse. When you see something like butter/margarine debates, it’s saying look — this is likely a place where details matter. When you see something like smoking studies, it’s saying this is a place where the general rule outweighs local factors.
I’m a fan of Hattie’s work, b/c I think his point and focus is correct. What are the areas that are like smoking? And what are the areas like margarine? Larry may not trust the latest health article, but I doubt he’s going to take up smoking anytime soon, drill asbestos without a mask, eat undercooked food or drink unfiltered city water. Or drive without a seat belt. And strangely enough, if you look at increase in longevity in this country and the world, you get crazy far knowing just a few things like that. (And that for me is the genius of Hattie).
On the “margarine” issues, the particular context comes more into play.And that’s a place, generally, that veers a bit towards art. One of the great things long-term teachers can bring to a situation is a great sense of “what works for our students” — which may sometimes be different than the general rule. But I don’t think of these things as in conflict — in fact, I’d argue the fuzzy areas may be a great indication to us that we have to test very specific implentations on targeted populations, because fuzzy may be indicating the lack of an association that overrides other variables of implementation. Ask “smoke or don’t smoke” and the answer is clear. Ask “butter or margarine” and you have to say let’s back out and look at your diet as a whole…..