My father likes to say, “If you stick your head in the freezer and your feet in the oven, on average you’ll be comfortable.” Behind this pithy saying is an insight that is a little different from the “three kinds of lies” saying about statistics. It suggests that certain types of analysis produce a false coherence to the world. We see honestly patterns where there aren’t any. This is what appears to have happened with studies that Purdue University has done regarding Course Signals’ effectiveness.
From this desk here, without a stitch of research, I can show that people who have had more car accidents live, on average, longer than people that have had very few car accidents.
Why? Because each year you live you have a chance of racking up another car accident. In general, the older you live, the more car accidents you are likely to have had.
If you want to know whether people who have more accidents are more likely to live longer becauseof the car accidents, you have to do something like take 40 year-olds and compare the number of 40 year-olds that make it to 41 in your high and low accident groups (simple check), or use any one of a number of more sophisticated methods to filter out the age-car accident relation.
The Purdue example is somewhat more contained, because the event of taking a Course Signals class or set of classes happens once per semester. But what I am asking is whether
- the number of classes a student took is controlled, for, and more importantly,
- whether first to second year retention is calculated as
- the number of students that started year two / the number of students who started year one (our car accident problem), or
- the number of students who started year two / the number of students who finished year one (our better measure in this case).
Pointing to the first of these posts, I suggested that a response from the Purdue researchers would be helpful. (It still would be.) Now Al Essa, recently of Desire2Learn and currently of McGraw Hill, has done a more mathematically rigorous analysis showing that Mike’s intuition appears to be correct. He took the Purdue findings, substituted the phrase “were given a chocolate” for “took a class using Course Signals,” and ran a simulation to see whether he could reproduce the Purdue results with no causal connection between the chocolatey intervention and the retention results:
The following are some results from the simulation. The first row displays retention rates for students who received no chocolates. The second row displays retention rates for students who received at least one chocolate. The last row shows students who received two or more chocolates. Why track students who received two or more chocolates? Because the authors of the study claim that two is the “magic number” where significant retention gains kick in.
The simulation data shows us that the retention gain for students is not a real gain (i.e. causal) but an artifact of the simple fact that students who stay longer in college are more likely to receive more chocolates. So, the answer to the question we started off with is “No.”. You can’t improve retention rates by giving students chocolates.
This is a problem that goes well beyond Course Signals itself for several reasons. First, both Desire2Learn and Blackboard have modeled their own retention early warning systems after Purdue’s work. For that matter, I have praised Course Signals up and down and criticized these companies for not modeling their products more closely on that work, largely based on the results of the effectiveness studies. So we don’t know what we thought we knew about effective early warning systems. The fact that the research results appear to be spurious does not mean that systems like Course Signals has no value, but it does mean that we don’t have the proof that we thought we had of their value.
More generally, we need to work much harder as a community to critically evaluate effectiveness study results. Big decisions are being made based on this research. Products are being designed and bought. Grants are being awarded. Laws are starting to be written. I believe strongly in effectiveness research, but I also believe strongly that effectiveness research is hard. The Purdue results have been around for quite a while now. It is disturbing that they are only now getting critical examination.
Update: Seeing some of the posts in the comments thread, I feel the need to make a clarification in fairness to Purdue. They have two important sets of research findings, only one of which is being called into question here. Their early findings, that Course Signals can increase student grades and chances of completion within a class, are not being challenged here. Those are important results. The findings that are being questioned are their longitudinal analysis showing that students who Course Signals in one class are more likely to do well in future classes. Even there, it is possible that Course Signals does have some long-term effect. The point is simply that the result the researchers got in their analysis looks suspiciously similar to the results one would get from a fairly straightforward selection bias.