My father likes to say, “If you stick your head in the freezer and your feet in the oven, on average you’ll be comfortable.” Behind this pithy saying is an insight that is a little different from the “three kinds of lies” saying about statistics. It suggests that certain types of analysis produce a false coherence to the world. We see honestly patterns where there aren’t any. This is what appears to have happened with studies that Purdue University has done regarding Course Signals’ effectiveness.
The problem was speculatively raised and then re-explained by Mike Caulfield (who we are proud to now have as an e-Literate featured blogger):
From this desk here, without a stitch of research, I can show that people who have had more car accidents live, on average, longer than people that have had very few car accidents.
Why? Because each year you live you have a chance of racking up another car accident. In general, the older you live, the more car accidents you are likely to have had.
If you want to know whether people who have more accidents are more likely to live longer becauseof the car accidents, you have to do something like take 40 year-olds and compare the number of 40 year-olds that make it to 41 in your high and low accident groups (simple check), or use any one of a number of more sophisticated methods to filter out the age-car accident relation.
The Purdue example is somewhat more contained, because the event of taking a Course Signals class or set of classes happens once per semester. But what I am asking is whether
- the number of classes a student took is controlled, for, and more importantly,
- whether first to second year retention is calculated as
- the number of students that started year two / the number of students who started year one (our car accident problem), or
- the number of students who started year two / the number of students who finished year one (our better measure in this case).
Pointing to the first of these posts, I suggested that a response from the Purdue researchers would be helpful. (It still would be.) Now Al Essa, recently of Desire2Learn and currently of McGraw Hill, has done a more mathematically rigorous analysis showing that Mike’s intuition appears to be correct. He took the Purdue findings, substituted the phrase “were given a chocolate” for “took a class using Course Signals,” and ran a simulation to see whether he could reproduce the Purdue results with no causal connection between the chocolatey intervention and the retention results:
The following are some results from the simulation. The first row displays retention rates for students who received no chocolates. The second row displays retention rates for students who received at least one chocolate. The last row shows students who received two or more chocolates. Why track students who received two or more chocolates? Because the authors of the study claim that two is the “magic number” where significant retention gains kick in.
The simulation data shows us that the retention gain for students is not a real gain (i.e. causal) but an artifact of the simple fact that students who stay longer in college are more likely to receive more chocolates. So, the answer to the question we started off with is “No.”. You can’t improve retention rates by giving students chocolates.
This is a problem that goes well beyond Course Signals itself for several reasons. First, both Desire2Learn and Blackboard have modeled their own retention early warning systems after Purdue’s work. For that matter, I have praised Course Signals up and down and criticized these companies for not modeling their products more closely on that work, largely based on the results of the effectiveness studies. So we don’t know what we thought we knew about effective early warning systems. The fact that the research results appear to be spurious does not mean that systems like Course Signals has no value, but it does mean that we don’t have the proof that we thought we had of their value.
More generally, we need to work much harder as a community to critically evaluate effectiveness study results. Big decisions are being made based on this research. Products are being designed and bought. Grants are being awarded. Laws are starting to be written. I believe strongly in effectiveness research, but I also believe strongly that effectiveness research is hard. The Purdue results have been around for quite a while now. It is disturbing that they are only now getting critical examination.
Update: Seeing some of the posts in the comments thread, I feel the need to make a clarification in fairness to Purdue. They have two important sets of research findings, only one of which is being called into question here. Their early findings, that Course Signals can increase student grades and chances of completion within a class, are not being challenged here. Those are important results. The findings that are being questioned are their longitudinal analysis showing that students who Course Signals in one class are more likely to do well in future classes. Even there, it is possible that Course Signals does have some long-term effect. The point is simply that the result the researchers got in their analysis looks suspiciously similar to the results one would get from a fairly straightforward selection bias.
Margaret Korosec (@mdkorosec) says
Thank you for this commentary. I have presented at conferences in South Africa and Germany and have included Signals or other examples of learning analytics as a great thing. The concept is certainly great. No doubt! But when these data do not stand alone or even stand up then we have more questioning to do. But really…did anyone ever think the planning was done just because Signals was implemented? I hope not.
It is all in the name of progress and change for the better. There is no harm done. How can we evolve and change without trying? How can technology ‘enhance’ learning without giving it a go (UK terminology here…)?
Kudos to Purdue for trying their best and kudos to the students who were the guinea pigs…Let’s hope there was no detriment to students and let’s hope that Purdue is learning from the evolving and incoming data.
Great post. Thanks Michael!
Dr. Deborah Everhart says
Great discussion on a critically important topic. Several people brought up the human factor, and I think it’s worth re-iterating that a combination of analytics with human judgments and decisions are an extremely powerful combination. For example, in the Blackboard Retention Center, thousands of data points are combined into an easy-to-use learning analytics visualization that lets faculty focus quickly on who needs attention and how the course can be improved. Who hasn’t logged into the course in more than a week? How many people failed the last test? Does the item analysis show that the test needs to be improved? (Maybe it’s not the students’ fault that they failed!) These are not automated actions– they are tools to inform action. Analytics can empower faculty and the students themselves to improve learning in ways that were difficult, time-consuming, or impossible before.
Predictive analytics are hard to get right, and research needs to be done over considerable periods of time to yield good results. We need to continue to invest in this research, eventually leading to better automated results– but more importantly, recognize the power of using analytics to empower more timely and more effective *human* decisions.