EdSurge‘s Tony Wan is first out of the blocks with an Instructurecon coverage article this year. (Because of my recent change in professional focus, I will not be on the LMS conference circuit this year.) Tony broke some news in his interview with CEO Dan Goldsmith with this tidbit about the forthcoming DIG analytics product:
One example with DIG is around student success and student risk. We can predict, to a pretty high accuracy, what a likely outcome for a student in a course is, even before they set foot in the classroom. Throughout that class, or even at the beginning, we can make recommendations to the teacher or student on things they can do to increase their chances of success.Instructure CEO Dan Goldsmith
There isn’t a whole lot of detail to go on here, so I don’t want to speculate too much. But the phrase “before they even set foot in the classroom” is a clue as to what this might be. I suspect that the particular functionality he is talking about is what’s known as an “student retention early warning system.”
Or maybe not. Time will tell.
Either way, it provides me with the thin pretext I was looking for to write a post on student retention early warning systems. It seems like a good time to review the history, anatomy, and challenges of the product category since I haven’t written about them in quite a while and they’ve become something of a fixture. The product category is also a good case study in why tool that could be tremendously useful in supporting students who need help the most often fails to live up to either its educational or commercial potential.
The archetype: Purdue Course Signals
The first retention early warning system that I know of was Purdue Course Signals. It was an experiment undertaken by Purdue University to—you guessed it—increase student retention, particularly in the first year of college, when students tend to drop out most often. The leader of the project, John Campbell, and his fellow researchers Kim Arnold and Matthew Pistilli, looked at data from their Student Information System (SIS) as well as the LMS to see if they could predict and influence students. Their first goal was to prevent them from dropping courses, but they ultimately wanted to prevent those students from dropping out.
They looked at quite a few variables from both systems, but the main results they found are fairly intuitive. On the LMS side, the four biggest predictors they found for students staying in the class (or, conversely, for falling through the cracks) where
- Student logins (i.e., whether they are showing up for class)
- Student assignments (i.e., whether they are turning in their work)
- Student grades (i.e., whether their work is passing)
- Student discussion participation (i.e., are they participating in class)
All four of these variables were compared to the class average, because not all instructors were using the LMS in the same way. If, for example, the instructor wasn’t conducting class discussions online, then the fact that a student wasn’t posting on the discussion board wouldn’t be a meaningful indicator.
These are basically four of the same very generic criteria that any instructor would look at to determine whether a student is starting to get in trouble. The system is just more objective and vigilant in applying these criteria than instructors can be at times, particularly in large classes (which is likely to be the norm for many first-year students). The sensitivity with which Course Signals would respond to those factors would be modified by what the system “knew” about the students from their longitudinal data—their prior course grades, their SAT or ACT scores, their biographical and demographic data, and so on. For example, the system would be less “concerned” about an honors student living on campus who doesn’t log in for a week than about a student on academic probation who lives off-campus.
In the latter case, the data used by the system might not normally be accessible, or even legal, for the instructor to look at. For example, a disability could be a student retention risk factor for which there are laws governing the conditions under which faculty can be informed. Of course, instructors don’t have to be informed in order for the early warning system to be influenced by the risk factor. One way to think about a way that this sensitive information could be handled is like a credit score. There is some composite score that informs the instructor that the student is at increased risk based on a variety of factors, some of which are private to the student. The people who are authorized to see the data can verify that the model works and that there is legitimate reason to be concerned about the student, but the people who are not authorize are only told that the student is considered at-risk.
Already, we are in a bit of an ethical rabbit hole here. Note that this is not caused by the technology. At least in my state, the great Commonwealth of Massachusetts, instructors are not permitted to ask students about their disabilities, even though that knowledge could be very helpful in teaching those students. (I should know whether that’s a Federal law, but I don’t.) Colleges and universities face complicated challenges today, in the analog world, with the tensions between their obligation to protect student privacy and their affirmative obligation to help the students based on what they know about what the students need. And this is exactly the way John Campbell characterized the problem when he talked about it. This is not a “Facebook” problem. It’s a genuine educational ethical dilemma.
Some of you may remember some controversy around the Purdue research. The details matter here. Purdue’s original study, which showed increased course completion and improved course grades, particularly for “C” and “D” students, was never questioned. It still stands. A subsequent study, which purported to show that student gains persisted in subsequent classes, was later called into question. You can read the details of that drama here. (e-Literate played a minor role in that drama by helping to amplify the voices of the people who caught the problem in the research.)
But if you remember the controversy, it’s important to remember three things about it. First, the original research about persistence was not ever called into question. Second, the subsequent finding was not disproven; rather, there was a null hypothesis. We have proof neither for nor against the hypothesis that the Perdue system can produce longer term effects. And finally, the biggest problem that controversy exposed was with university IR departments releasing non-peer-reviewed research papers that staff researchers have no power to respond to on their own when they get criticized. That’s worth exploring further some other time, but for now, the point is that the process problem was the real story. The controversy didn’t invalidate the fundamental idea behind the software.
Since then, we’ve seen lots of tinkering with the model on both the LMS and SIS sides of the equation. Predictive models have gotten better. Both Blackboard and D2L have some sort of retention early warning products, as do Hobsons, Civitas, EAB, and HelioCampus, among others. There were some early problems related to a generational shift in data analytics technologies; most LMSs and SISs were originally architected well before the era when systems were expected to provide the kind of high-volume transactional data flows needed to perform near-real-time early warning analytics. Those problems have increasingly been either ironed out or, at least, worked around. So in one sense, this is a relatively mature product category. We have a pretty good sense of what a solution looks like and there are a number of providers in the market right now with variations on on the theme.
In a second sense, the product category hasn’t fundamentally changed since Purdue created Course Signals over a decade ago. We’ve seen incremental improvements to the model, but no fundamental changes to it. Maybe that’s because the Purdue folks pretty much nailed the basic model for a single institution on the first try. What’s left are three challenges that share the common characteristic of becoming harder when converted from an experiment by a single university to a product model supported by a third-party company. At the same time, They fall on different places on the spectrum between being primarily human challenges and primarily technology challenges. The first, the aforementioned privacy dilemma, is mostly a human challenge. It’s a university policy issue that can be supported by software affordances. The second, model tuning, is on the opposite end of the spectrum. It’s all about the software. And the third, which is the last mile problem from good analytics to actual impact, is somewhere in the messy middle.
Three significant challenges
I’ve already spent some time on the student data privacy challenge specific to these systems, so I won’t spend much more time on it here. The macro issue is that these systems sometimes rely on privacy-sensitive data to determine—with demonstrated accuracy—which students are most likely to need extra attention to make sure they don’t fall through the cracks. This is an academic (and legal) problem that can only be resolved by academic (and legal) stakeholders. The role of the technologists is to make the effectiveness and the privacy consequences of various software settings both clear and clearly in the control of the appropriate stakeholders. In other words, the software should support and enable appropriate policy decisions rather than obscuring or impeding them. At Purdue, where Course Signals was not a product that was purchased but a research initiative that had active, high-level buy-in from academic leadership, these issues could be worked through. But a company selling the product into as many universities as possible with differing levels of sophistication and policy-making capability in this area, the best the vendor can do is build a transparent product and try to educate their customers as best as they can. You can lead a horse to water and all that.
On the other end of the human/technology spectrum, there is an open question about the degree to which these systems can be made accurate without individual hand tuning of the algorithms for each institution. Purdue was building a system for exactly one university, so it didn’t face this problem. We don’t have good public data on how well its commercial successors work out of the box. I am not a data scientist, but I have had this question raised by some of the folks who I trust the most in this field. That, in turn, means that each installation of the product would require a significant services component, which would raise the cost and make these systems less affordable to the access-oriented institutions that need them the most. This is not a settled question; the jury is still out. I would like to see more public proof points that have undergone some form of peer review.
And in the middle, there’s the question of what to do with the predictions in order to produce positive results. Suppose you know which students are more likely to fail the course on Day 1. Suppose your confidence level is high. Maybe not Minority Report-level stuff—although, if I remember the movie correctly, they got a big case wrong, didn’t they?—but pretty accurately. What then? At my recent IMS conference visit, I heard one panelists on learning analytics (depressingly) say, “We’re getting really good at predicting which students are likely to fail, but we’re not getting much better at preventing them from failing.”
Purdue had both a specific theory of action for helping students and good connections among the various program offices that would need to execute that theory of action. Campell et al believed, based on prior academic research, that students who struggle academically in their first year of college are likely to be weak in a skill called “help-seeking behavior.” Academically at risk students often are not good at knowing when they need help and they are not good at knowing how to get it. Course Signals would send students carefully crafted and increasingly insistent emails urging them to go to the tutoring center, where staff would track which students actually came. The IR department would analyze the results. Over time, the academic IT department that owned the Course Signals system itself experimented with different email messages, in collaboration with IR, and figured out which ones were the most effective at motivating students to take action and seek help.
Notice two critical features to Purdue’s method. First, they had a theory about student learning—in this case, learning about productive study behaviors—that could be supported or disproven by evidence. Second, they used data science to test a learning intervention that they believed would help students based on their theory of what is going on inside the students’ heads. This is learning engineering. It also explains why the Purdue folks had reason to hypothesize that the effects of using Course Signals might persist with students after they stopped using the product. They believed that students might learn the skill from the product. The fact that the experimental design of their follow-up study was flawed doesn’t mean that their hypothesis was a bad one.
When Blackboard built their first version of a retention early warning system—one, it should be noted, that is substantially different from their current product in a number of ways—they didn’t choose Purdue’s theory of change. Instead, gave the risk information to the instructors and let them decide what to do with it. As have many other designers of these systems. While everybody that I know of copied Purdue’s basic analytics design, nobody that I know—at least no commercial product developers that I know of—copied Purdue’s decision to put so much emphasis on student empowerment first. Some of this has started to enter product design in more recent years now that “nudges” have made the leap from behavioral economics into consumer software design. (Fitbit, anyone?) But the faculty and administrators remain the primary personas in the design process for many of these products. (For non-software designers, a “persona” is an idealized person that you imagine that you’re designing the software for.)
Why? Two reasons. First, students don’t buy enterprise academic software. So however much the companies that design these products may genuinely want to serve students well, their relationship with them is inherently mediated. The second reason is the same as with the previous two challenges in scaling Purdue’s solution. Individual institutions can do things that companies can’t. Purdue was able to foster extensive coordination between academic IT, institutional research, and the tutoring center, even though those three organizations live on completely different branches of the organizational chart in pretty much every college and university that I know. An LMS vendor has no way of compelling such inter-departmental coordination in its customers. The best they can do is give information to a single stakeholder who is most likely to be in a position to take action and hope that person does something. In this case, the instructor.
One could imagine different kinds of vendor relationships with a service component—a consultancy or an OPM, for example—where this kind of coordination would be supported. One could also imagine colleges and universities reorganizing themselves and learning new skills to become better at the sort of cross-functional cooperation for serving students. If academia is going to survive and thrive in the changing environment it finds itself in, both of these possibilities will have to become far more common. The kinds of scaling problems I just described in retention early warning systems are far from unique to that category. Before higher education can develop and apply the new techniques and enabling technologies it needs to serve students more effectively with high ethical standards, we first need to cultivate an academic ecosystem that can make proper use of better tools.
Given a hammer, everything looks pretty frustrating if you don’t have an opposable thumb.