The “Course Signals” story originally covered here has recently gone international, with Britain’s prestigious Times Higher Education magazine picking up the Inside Higher Ed story and publishing it as an “Editor’s Pick”. Hopefully this will push the Course Signals team to answer questions asked of them nearly two months ago, questions that have still not been satisfactorily answered.
We realize those watching the posts on e-Literate over the past couple weeks may have some questions about what the “Course Signals issue” is, what it isn’t, and why it is so important for the educational technology community to make sure Purdue is accounting for the recent issue discovered with their statistical approach. This explainer should get you up to speed.
What is Course Signals? Why is it important?
Course Signals is a software product developed at Purdue University to increase student success through the use of analytics to alert faculty, students, and staff to potential problems. Through using a formula that takes into account a variety of predictors and current behaviors (e.g. previous GPA, attendance, running scores), Course Signals can help spot potential academic problems before traditional methods might. That formula labels student status in given course according to a green-yellow-red scheme that clearly indicates whether students are in danger of the dreaded DWIF (dropping out, withdrawing, getting an incomplete, or failing)
While the product is used to to improve in-class student performance, the product is most often discussed in in a larger frame, as a product that increases long-term student success. The product has won prestigious awards for its approach to retention, and the product is particularly important in the analytics field, as its reported ability to increase retention by 21% makes it one of the most effective interventions out there, and suggests that technological solutions to student success can significantly outperform more traditional measures.
What problems were found in the data supporting the retention effects?
Purdue had been claiming that taking classes using CS technology led to better retention. Several anomalies in the data led to the discovery that the experiment may suffer from a “reverse-causality” problem.
One such anomaly was an odd “dose-response” curve. With many effective interventions, as exposure to the intervention increases, the desired benefit increases as well. In the recent Purdue data, taking one Course Signal-enhanced course was shown to have a very slight negative benefit, while taking two had a very strong benefit.
The story became even more complex when older data was examined. Early in the program taking one CS-enhanced course had a very substantial impact on retention, nearly equal to taking two CS-enhanced classes. But as the program expanded over the years, taking one CS-enhanced class started to show no impact at all. This behavior is not consistent with Course Signals causing higher retention.
I hypothesized a simple model to explain this shift: rather than students taking more CS-courses retaining at a higher rate, what was really happening was that the students who dropped out mid-year were taking less CS classes because they were taking less classes period. In other words, the retention/CS link existed, but not in a meaningful way. Unlike the Purdue model where taking CS-enhanced courses caused retention, this “reverse-causality” model explained why as participation expanded taking one CS-enhanced course might move from being a strong predictor to having no predictive force at all.
Michael Feldstein picked up on this analysis, and prodded the Purdue team for a response. When no response came, Alfred Essa, head of R & D and Analytics at McGraw-Hill, took my “back-of-the-envelope” model, and built it out into a full-fledged simulation. The simulation confirmed the reverse-causality model explained the data anomalies very well, much better than Purdue’s causal model. Purdue’s response to the simulation did not address the serious issues raised.
Does this mean Course Signals does not work?
It depends. Purdue has yet to respond to the new information in any meaningful way, and until they either release revised estimates that control for this effect or release their data for third-party analysis, we don’t know the full story. Additionally, there are some course level effects seen in early Signals testing that will be unaffected by the issue.
However, Purdue’s recent response to Inside Higher Ed indicates that they did not control for the reverse-causality issue at all. If this is true, then the likelihood is that the retention impact of Course Signals will be positive, but significantly below the 21% they have been claiming.
But positive impact is good, right?
Not really. The great insight regarding educational interventions of the past decade or so is what we might term “Hattie’s Law”, after researcher John Hattie. Most educational interventions have some effect. Doing something is usually better than doing nothing. The question that administrators face is not which interventions “work”, but which interventions “work better than average.”
At a 21% impact on retention, Course Signals was clearly in the “better than average” category, and its unparalleled dominance in that area suggested that the formula and approach embraced by Course Signals formed the best possible path forward.
Halve that impact and everything changes. Peer coaching models such as InsideTrack have shown impact in the 10-15% range. Increased student aid has shown moderate impact, as has streamlined registration and course access initiatives.
Additionally, other analytics packages exist that have taken a different route than Course Signals. Up until now, they have lived in the shadow of Purdue’s success. If CS impact is shown to be significantly reduced, it may be time to give those approaches a second look.
What is unaffected by the new analysis?
Until Purdue fixes and reruns their analysis, it it hard to know what the effects might be. However, there were a number of claims Purdue made that were not based on longitudinal analysis, and these should stand. For instance, students in Course Signals do tend to get more A’s and less F’s, and that data would be unaffexted by this issue.
While that’s good, it’s not the major intent of at least some institutions interested in the system. What makes systems like this particularly attractive is their ability to pay for themselves over time by increasing retention.
There remains a question as to how a system that boosts grades could fail to boost retention. There are a couple potential hypotheses. First of all, it is quite possible that when the numbers are rerun there will still be a significant, though reduced, retention effect, and that reduced effect is still congruent with the better scores.
Alternately, it could be that students in Course Signals courses score highly in Course Signals-enhanced courses, but at the expense of other courses. My daughter’s math teacher has a very strict policy on math homework which has whipped her into shape in that class, but this means she often delays studying for other things. Students with finite time resources can rearrange their time, but not always expand it.
Finally, for some nontrivial amount of students, retention problems are not due to grades. Not to push the reverse-causality logic too far, but for some students low grades could be a sign of financial or domestic difficulty; fixing the grade would not address the larger problem.
What are the larger cultural implications?
As Michael has outlined in a different post, there are major cultural implications to this error, ones which partially indict the research analytics community’s approach to research. To my knowledge, the study was never peer-reviewed outside of its inclusion in conference proceedings, but it is one of the most referenced studies in learning analytics.
Technology does move fast enough that old publication cycles do not serve the industry well. But if pre-publication peer-review does not exist, there are a host of things we need to make post-publication review work. We need to release more underlying data, invite more criticism, and separate the PR arm of many organizations from their research arm (or at least insure more autonomy). Additionally, we may need to place more rigorous controls on conference presentation, and make sure that presentations making strong statistical claims undergo a more thorough and profiessional review.
The cultural implications of an error like this going undetected this long in a community that is supposedly a community of data analysts are also stunning, and will be the subject of a future post. For the moment we are still waiting for Purdue to engage honestly with the critique, and re-run their numbers after controlling for this effect. Hopefully that will happen later this week.
UPDATE: As Doug notes below, the paper did undergo a full peer review before its inclusion in the LAK conference. I was aware of that, but reading through the post, I realize that is not clear. As I mentioned, we’re looking at putting together a more detailed analysis on how we got here after we know better what the damage is, and will walk through those issues more thoroughly at that time. In the meantime, I’d love to start a conversation about that issue in the comments. Let’s assume that some analytics is sugar water, and some is useful medicine. How do we create a culture and a process that helps us separate one from the other? What’s preventing us from doing that now?
dougclow says
Thanks for drawing attention to this – it seems a crucial moment for the learning analytics field.
For the record, the paper in the LAK12 proceedings (http://dl.acm.org/citation.cfm?id=2330666) was (almost certainly) peer reviewed before acceptance and publication. Many educational conferences don’t review contributions, or only lightly, but the LAK conference follows practice in computer science by fully peer reviewing full papers before acceptance – and has a fairly high rejection rate. The paper was a ‘short paper’ rather than a full one, but that doesn’t necessarily imply a lower standard of quality – indeed a short paper won the best paper award at LAK13.
(Disclosure: I also had a short paper reviewed and published in the LAK12 proceedings.)
You’re quite right, though, that this raises important issues about the field – and I too plan to post about this shortly.
John Whitmer says
Echoing Doug, I also appreciate you raising this issue. I We’ve been on the “Analytics” hype cycle peak of inflated expectations for a while now, and it’s about time we leave that phase and get into the productivity phase.
A few reflections from my perspective:
1) Reading this and prior posts, Purdue is positioned as if they created Signals, presented research results, then got huge attention based on that research. I think it happened differently: they created Signals, announced it – struck a nerve and got huge attention. I recall John Campbell, the Signals founder, saying that after releasing Signals at Purdue, someone at NBC heard about it, they presented it on a new segment, and got dozens of calls to show/share it – so they outsourced implementation to a commercial firm.
In other words, they weren’t seeking this attention – *we* were looking to Signals as an example of early alert analytics and had high expectations.
From this perspective, it’s not too surprising that Matthew and Purdue haven’t responded to requests for a clarification or more data; they weren’t pushing this out there and making big claims, they were just reflecting on what they were finding.
2) I also cite Purdue’s research on Signals frequently – because I *can’t* find any other study with empirical data about the impact of an early alert system, using LMS / academic technology data, on student retention. If you (or any other reader) know of any, please message me / add them to the comments!
3) I hope this discussion can help to motivate more empirical research on the effectiveness of early warning and predictive analytics systems. We are at a *very* early stage of this work, which is recognized by all of us in the Learning Analytics community. I hope that an empirical critique doesn’t undue previous work, but points ahead to the additional work that we need to do. I can critique and poke holes in previous studies that I’ve done, but would rather take those lessons into new research.
Again, thanks for leading this discussion and may a thousand flowers bloom.
mikecaulfield says
I might follow your first point, except, of course, Purdue is now commercializing the technology with Ellucian. I originally looked into this issue because my previous institution was considering buying the product, which of course would divert money and effort from other potential initiatives. Signals was seen as the likely path primarily because the research was perceived as being strong. I think when you sell something based on research that there’s a different set of obligations. Thanks for the background though, I was unaware that the CBS report had swept them up in a bit of a tidal wave.
On your last point, I 100% agree. I worry that the there’s a trend of hyped technologies failing in experimentation, and the reaction *might* move to the idea that experiments that are rigorous aren’t worth it from a hype perspective. You look at Udacity, and how that single failure at SJSU pushed them out of the industry. Now there’s this, which looks less solid than we thought. The point *should* be that we learn important, counter-intuitive things when we run experiments, which makes rigor *more* worthwhile. But we’re at a pivotal moment in higher education, and I haven’t seen a lot of bravery yet.