Doug Clow of the Open University has published a thoughtful and detailed blog post in response to the Course Signals effectiveness controversy. He covers far too much ground for me to attempt to summarize here, but I think there are some common themes emerging from the commentary so far:
- The concerns over the one study have not changed the fact that Course Signals and the researchers who have been studying it are generally held in high regard. They have some very strong intra-course results which have not been challenged by current re-analysis. Even on the research that is now being challenged, they made a good faith effort to follow best research practices and are exemplars in some respects. On a personal note, I know Kim Arnold (although I did not realize she was an author on the particular study in question until Doug mentioned it in his post) and, like Doug, I think very highly of her and have learned a lot from her about learning analytics over the years.
- That said, both the researchers and, especially, Purdue as an institution have an obligation to respond as promptly as is feasible to the challenge, in part because Purdue has chosen to license the technology in question and stands to make money based on these research claims (regardless of whether the researchers’ work was independent of the business deal). To be clear, nobody is accusing anyone of deliberately cooking the books. The point is simply that Purdue has an added ethical obligation as a consequence of the business deal.
- The larger problem is not so much with the Purdue work itself as it is with the fact that both pre- and post-publication peer review failed. This can happen, even in papers that get a lot more eyeballs than this one did; Mike Caulfield has aptly pointed to the widely influential Reinhart and Rogoff economics paper on the effects of debt on national economies which has been belatedly proven to be in error as an apt analogy. Nevertheless, any such failure should trigger some introspection within a field regarding whether we should be doing more to cultivate robust community-based exploration of these studies, of which peer review is a part.
I highly recommend reading Doug’s post in full.
Peter Hess says
The “Doug’s post” link goes to the wrong site. The link at the top works.
I’ve learned a lot by posting questioning (contrarian) comments here. Take that as an apology in advance. Doug clearly knows much more about statistics than I do, and I respect his knowledge, but his post reads to me like an attempt to put the best face on an uncomfortable revelation, perhaps because people he “admires” are being criticized or because he, like they, wishes Course Signals to work. I do too, but desiring an outcome to a research project should be a red flag to a researcher.
“Kimberley Arnold and Matthew Pistilli (disclosure: I’ve met and admire both of them) used this as a ‘natural experiment’,,,”
A natural experiment sounds to me like one in which there was not much in the way of experimental design, but the researchers choose to analyze data anyway, because they had it. Netflix and Spotify can work that way, because not much is at stake, but it reminds me that Spotify is very spotty in selecting music I want to hear next.
“Almost nobody wants to do proper testing of educational interventions. It’s hard work to do at all, and very hard to do well. And at the moment it’s an almost impossibly hard sell to senior managers – and if you can’t get their buy in, you’re not going to get the scale that you need.”
“Really robust testing means, to my mind, randomised controlled trials (RCTs), and preferably as blinded as possible.”
I’ll take Doug’s word that it’s as difficult as he says – it sounds right to me – but it doesn’t excuse not doing “really robust testing”. With Educational Technology interventions, we can predict effect size will be small and that makes research results very subject to design flaws. If you can’t find a way to do “robust” quantitative research, better to do qualitative research or present anecdotal evidence, admitting up-front the limitations therein. Yes, that takes us down a peg, or two, but it’s better than the pretense of putting up numbers which seem to be what they are not.
“I’m not sure I know of any learning analytics project that had better evidence to support it.”
Given what we now know about the Course Signals study, this would be startling if true.
Michael Feldstein says
Thanks for the correction on the link, Peter. It’s been fixed. And if you follow it back to Doug’s post and read the comments thread, you’ll see that he acknowledges some of your concerns about the Purdue study design. As far as I can tell, it is true, if startling, that the Purdue study is among the better designed studies being done now (acknowledging that their is an incredibly broad range and that being well above the median is not the same as being near the high end of the spectrum). We have some very serious practical challenges to experimental design in this field, and we need to work harder as a community to both find creative solutions and communicate clearly about the limitations under which particular studies are conducted. This is why I am deeply skeptical that just having all the data that MOOC providers have really will yield some magical insights. Volume is not a solution to the problem of lack of experimental controls.
That said, there is a danger of buying into the counter-narrative too easily. Just because it is difficult to prove significant effects and just because educational success can often include multiple factors does not mean that it is impossible to ever find and prove large effects from interventions. Purdue’s hypothesis was plausible. They argued that, across a wide range of educational contexts, students who are on the bubble between passing and failing end up failing because they don’t have the meta-awareness to know when they should seek help and the reinforcing experience of having benefitted from help-seeking behavior. They argued that if you could teach students the skills necessary to have them get help, you should be able to nudge a large number of students to the other side of the bubble, and this effect should be lasting (since the students would have learned a skill). Having worked with at-risk students myself, I found the argument to be compelling.
Keep in mind, too, that while Purdue’s experimental design may or may not have been best-in-class, the design is not really the problem that tripped them up. It was the data analysis. And the thing about that is that it absolutely should have been caught by peer review. I think Mike Caulfield is brilliant, but he will be the first to admit that he is not a statistician. He had a basic intuition about the data that other researchers looking at the results also should have had. Pointing the finger at the researchers is too easy and convenient. This was a systemic failure.
Peter Hess says
Hi Michael,
Thanks for the interesting post. As expected, I learned from it. There are a few things I’d like to add:
1) You don’t need statistical validation to pursue a pedagogy or strategy that you think is beneficial. I’m convinced by your gut feeling (it is that, I think) about meta-awareness, and whereas I can’t say the same the Course Signals published claims, that does not persuade me that Course Signals has no value.
2) Despite what you may think, I do subscribe to the importance of gathering and analyzing data. Here’s a good example of the value of doing so: http://www.ted.com/talks/stefan_larsson_what_doctors_can_learn_from_each_other.html. But I remain deeply doubtful about educational research, because any interesting problem or approach will come with too many confounding, conflicting, interacting variables to to tease them apart in an experimental design. For example, “They argued that if you could teach students the skills necessary to have them get help…this effect should be lasting (since the students would have learned a skill).” Sounds good, but what happens when you’ve taught this skill to a large number of students. One not-too-far-fetched scenario is the faculty get overwhelmed, the quality and availability of help goes down, students get discouraged and stop seeking help. Complications like this abound, even within this one seemingly simple hypothesis.
3) “Keep in mind, too, that while Purdue’s experimental design may or may not have been best-in-class, the design is not really the problem that tripped them up.” I would argue that it was part of what tripped them up, but it doesn’t matter – that’s a poor excuse for design shortcomings, as is the fact that other research is just as bad.
4) One thing I found interesting in the TED video linked above is that the experimental design was done collaboratively and iteratively by a large number of stake holders. A more open design process for the Course Signals “experiment” might have led to a more rigorous design, or a decision that a rigorous design was not possible given the constraints.
Michael Feldstein says
Let’s not let the best be the enemy of the good. When your experimental subjects are human beings with privacy rights and educational needs, it is often impossible to conduct a pristine experiment. And by the way, those privacy rights make it challenging to do crowdsourced experiments or even peer review of the data. I’m glad that the Purdue folks did this study and have no serious concerns about how it was done. Yes, the limitations on the methodology are noteworthy and should be made clear, but again, they are not really the problem here.
You do make a good point about the complex confounds involved in even seemingly straightforward studies, like the hypothetical tutoring shortage that you point out. Foundational research often entails leaving huge questions on the table because we can only pursue one piece at a time. I could be wrong about this, but I seem to recall that Purdue actually does track interventions from help-seeking behavior to see impact. Regardless, it’s an example of a follow-on study that could be done.
Kimberly Arnold says
This is an important conversation for all of us. Let me begin by stating that my responses are my own and do not represent Purdue University or the University of Wisconsin (where I am presently employed). They come from my experience “on the front lines” of learning analytics for the past seven years.
I am no longer affiliated with Purdue and, as such, cannot make any direct comment regarding recent developments related to Course Signals. The learning analytics field cannot grow as a field unless we have these difficult conversations. This is one of the glorious things about academia: the expectation that people will respectfully contribute to scholarly dialogue.
Many of the bloggers and commenters offer necessary critiques. The trick here is to discover how to keep constructive dialogues moving forward as we continue to develop the science. The learning analytics field is still relatively young and we need to continue to grapple with these difficult issues. We need to focus on openness and transparency while understanding our constraints. I look forward to the continuing dialogue.