Last week at the IMS conference, the LTAC (Learning Technology Advisory Council) had an interesting and, I think, fruitful discussion about “analytics.” In this context, the term umbrella term covers various types of data analysis that would be useful in helping ensure that more students learn more and better. One example that came up a couple of times in the discussion was some work done by John Campbell at Purdue University showing that analysis of some basic stats out the LMS (e.g., how recently and frequently a student has logged on) will predict the likelihood that the student will persist in the class and pass it with a very high degree of accuracy. The idea is that if you could get an early warning that a student is at risk, you can intervene and hopefully help that student get through a rough spot.
Of course, developing this kind of tool raises all sorts of interesting problems, not the least of which is privacy.
Campbell has pointed out that there are all sorts of competing moral obligations when it comes to data mining student behaviors that are captured in university IT systems, even when the intentions are the best. What do we have the right to look at? When do we have to ask the student’s prior permission, and when do we not? And what are the affirmative obligations? If we know we can find out how likely a student is to drop out, do we have a moral obligation to do so? And if we do, what moral obligation do we have to act on any such knowledge?
These are complex and often culturally specific issues that need to be negotiated by the entire campus community before ethical and effective policies can be put in place. That negotiation process is complex in and of itself, and is made more complex by the fact that we don’t know the potential benefits of data mining. And the reason that we don’t know the potential benefits is because it’s unethical to do the research without strong guarantees of anonymity–guarantees which are not well supported by typical current campus IT systems.
What I proposed to LTAC, and what the group seemed to accept as a good line of inquiry, was that we start by defining the requirements for an architecture of privacy. Given the kinds of analytics that we might want to employ, what are the privacy issues involved? If we were designing an identity management system to solve those privacy problems, what would it look like? Such a system could be helpful on two levels. First, by supporting full anonymization across the problem space, it would drastically lower the barrier to basic research that can tell us what’s possible. (By the way, this could have benefit far beyond research into educational analytics; potentially, it could be useful for meeting regulatory privacy requirements for any kind of research using human subjects.) Second, by providing a range of full anonymity, opt-in, opt-out, and mandatory non-anonymity options, it would enable different institutions to develop and evolve policies that fit their particular needs. This wouldn’t solve the ethical issues, of course, but it would turn them into policy issues rather than technology issues. It would make them solvable.
mattbucher says
Most LMSs have this functionality already. A course built in eCollege will show you exactly which students login x number of times and the total number of minutes spent logged in. How is this different than x student spent x hours in class and y student showed up only once? Attendance should not be private.
Michael Feldstein says
I don’t think it’s quite so simple, Matt. To begin with, I don’t think that logins to the LMS correlate neatly with attendance. What if the student is logging in to do an assignment? Does that count as "in-class" or "homework"? And should time spent on "homework" not be "private"?
Which brings me to the second problem. When you say that attendance should not be private, what does that mean, exactly? We know that LMSs make this information available to faculty today, but what about advisors? Should they be able to see this information? And what if a graduate program finds a correlation between undergraduate student LMS logins and their likelihood of doing well in graduate school? Should they have access to that data too?
Finally–and most importantly, the login metric was just an example. Once we have some real data mining capability in place, there’s no telling what other information might be useful. For example, suppose a correlation is discovered between discussion posting patterns and…I don’t know…post-graduation job placement. Who should have access to that data about a student?
The whole point about data mining in this context is that it’s somewhat speculative. You don’t know what information is going to turn out to be useful to whom and for what purposes. And since you don’t know in advance, you can’t have all the privacy policies worked out in advance either. That’s why you need a flexible architecture of privacy as a foundation before you start.
John Campbell says
I like the concept of “an architecture of privacy.” Because higher education institutions collect and store enormous amounts of information about their constituents, the number and kind of possible analytic projects are virtually limitless. The nearly infinite potential requires some type of overarching framework.
As noted previously, current course management systems do little beyond reporting. Considering how models could be developed using a wide range of academic tools – course management, clickers, podcast, etc. How this data is combined, utilized, and presented offers a number of technical, but more importantly, policy challenges.
Our analytics efforts at Purdue have focused on predicting student success based on the data we have on a student’s aptitude (standardized test scores, high school information, etc.) and the student’s effort within the course (course management system). While there is significant work to be completed, the initial results are promising.
Peter DeBlois, Diana Oblinger and I published an article in EDUCAUSE Review this summer on analytics. You can find it at: http://www.educause.edu/apps/er/erm07/erm0742.asp