My recent post on the challenges of using artificial intelligence and machine learning (AI/ML) in EdTech received various responses. Both some positive and some negative responses gave me the sense that focusing on the one rather extreme example of an open-ended chatbot suggested to some readers that I was arguing that all AI/ML is equally fraught, whether they agreed or disagreed with that proposition. As an antidote, I thought it might be helpful to provide snapshot analyses of different applications of AI/ML I’ve seen in EdTech. While it’s far from comprehensive, I hope it will illustrate some patterns:
- In general, AI/ML can be beneficial for supporting professional educators’ work (including learning designers, learning engineers, etc.) and directly impacting learner success.
- AI and ML are not magic. They have limitations that are both technology- and problem-domain-specific. Relatedly, AI/ML is not a monolith. That bucket consists of a large array of different techniques with different limitations and which, to make matters more complicated, are increasingly used in concert with each other.
- EdTech seems to go light on a strategy called “human-in-the-loop,” which means that expert humans review the algorithm’s output on a routine basis as a safety and quality check. This is bad.
I’m most familiar with work developing “courseware,” i.e., self-paced didactic or training materials, although I have a smattering of exposure to other areas. I’ll focus on courseware first and then provide some other examples.
Courseware
For the last decade, the emphasis in courseware has been on “adaptive learning” or “personalized learning,” where the algorithm adjusts to each student’s learning needs. More recently, efforts have broadened into using AI/ML to reduce time and lower the costs of producing new courseware. I’ll cover both areas and describe briefly how they overlap.
Adaptive Learning
The two most widely adopted and evidence-backed methods of algorithmic adaptive learning are memory-focused and skill-focused. Memory-focused is the more straightforward of the two. The easiest way to think about this family of techniques is as smart flashcards, even though the user experience doesn’t always present this way. We know a lot about how memory works that lends itself well to algorithms. For example, we know the most effective amount of time to wait before quizzing students again on a given fact to be memorized. This is called “spaced practice.” We know something about the best way to mix various topics for memorization, which is called “interleaving.” Writing an algorithm that uses this information to quiz students and then adjusts its strategy based on how well the students perform is a relatively straightforward, effective, and safe application of the technology. It’s mainly been applied to memorization, although it can be used to help learners check to see if they remember new skills they learned.
Skill-focused adaptive learning is more complicated. First, you have to be able to break down the skills into a tree, which has traditionally been an arduous process that can be harder than it sounds. For example, it turns out that when students learn to calculate the slope of a line, they first learn how to work with upward-sloping lines and downward-sloping lines separately. Then they learn to integrate these skills. To make matters worse, this process is generally unconscious to the student and invisible to the instructor. Skill acquisition, even in highly procedural subjects like math, is tricky because we can’t directly observe learning and because humans have sophisticated and often unconscious learning processes that evolved rather than being designed. They are continually surprising. That said, the line-slope-learning quirk was discovered by humans using a machine learning algorithm to identify patterns in the ways that many students progressed through many formative assessment questions.
In higher education, skill-based adaptive learning has shown the most benefit in STEM subjects, where the knowledge is often more overtly procedural. Implementation is often a problem. Neither learner nor educator always knows why the algorithm routes the students a particular way. And students can get stuck in a loop that the educators are powerless to free them from. The problem arises because many of these systems were not designed as educator-in-the-loop. On the contrary, they were intended to either reduce the number of educators required or “teacher-proof” courses against educators whose skills are not trusted. But like any other black box, when something breaks inside, you can’t fix it. In fact, it can be hard to anticipate where problems are likely to arise because you don’t always know what the box is doing.
I mentioned that mapping out skills for adaptive learning or other purposes has been a complex and labor-intensive process. That is starting to change. Emerging AI/ML models appear to be highly effective at mapping skills represented in content and then improved through input from experts and analysis of student data over time. This combination of methods, which utilizes multiple AI/ML strategies, shows promise to develop courseware more quickly, at higher quality, using less labor, and improving over time with human-in-the-loop supervision.
Having human experts involved is critical even in cases where adaptive learning is not employed. Adaptive learning is analytics plus automation. First, the system analyzes the student’s level of mastery. Then it automates the response. These days, most systems that lack automation, i.e., algorithmic adaptive learning, still have analytics. And mastery analytics are keyed to the skills in the skill map. If the algorithm misidentifies skills then the analytics, the progress indicators, will be wrong.
Increasing production efficiency
The skill mapping example demonstrates both quality improvements and production efficiencies. The trick is not to lose track of quality improvement while chasing production efficiency. Because you can easily make quality worse rather than better if you’re not careful.
Publishers are increasingly turning to AI/ML to generate assessment questions. For typical question types like multiple choice or fill-in-the-blank, the accuracy of these algorithms seems to hit a hard ceiling of 80%-85% accuracy. Some companies are deploying these questions directly to students without humans in the loop, arguing that their accuracy rate is the same as that of human-written textbooks. I’m personally uncomfortable with that. Adding a button for students to report questions they think are wrong is a kind of human-in-the-loop strategy, but it’s after the fact. The damage has been done. How much damage depends on various factors, like whether the assessment is formative or summative and whether the student has the self-confidence to question the computer program. We have to get more creative about developing new methods for including humans in the loop at scale.
Even common assessment types like the humble multiple choice question have tricky bits. Distractors—wrong answers—are essential for diagnosing why a learner may be stuck on a problem. Research tells us that hints (combined with the right machine learning algorithm) can also tell us a lot about how well a learner is progressing if they are written properly. I’m aware of efforts to generate distractors and hints algorithmically, but I don’t know how accurate they are.
Opportunities become more interesting—and squirrely—when we move beyond the usual machine-graded question types to more novel ones. For example, in a webinar by Walden University and Google Cloud about a new Google-powered tutor Walden is piloting, the presenters describe an interesting paraphrasing question type in which the system uses multiple AI/ML methods to check whether a student’s rephrasing of a concept in the course materials is both original enough that it shouldn’t be considered copying and close enough in meaning that the student is correctly paraphrasing.
To the degree that it works, it’s a wonderful addition to the toolbox. That said, I’d like to see efficacy studies, both for a general population of students and disagregated subpopulations of weak writers and second language learners. Like many of these innovations, it could make learning worse rather than better for some or all students. We shouldn’t feel confident we know until we’ve conducted rigorous research.
To be clear, I don’t know the existing literature on this particular AI/ML application and I also don’t know what research Walden and Google have conducted or are conducting. That question wasn’t asked in the recorded Q&A. More generally, this is another danger with AI/ML. Because the tech is so dazzling and there’s so much talk about data, it’s easy to assume that we know how well the innovation works with learners. But we can’t be sure until we’ve conducted multiple well-designed experiments at scale. For the same reason that AI/ML can speed up drug discovery but drug approval is still slow, we can’t responsibly implement new AI/ML EdTech as fast as it is developed.
Anyway, these are the mainstream applications that I’m aware of for courseware. More cutting-edge applications exist in niches where there is money to spend on VR, movement tracking, and other tech that is too expensive to be mainstream at the moment. I won’t cover these in this post other than to acknowledge that they exist.
A couple of other examples
Of course, courseware is far from the only application of AI/ML in EdTech. Chatbots are used quite a bit, though they are usually designed using AI/ML technologies that are quite different from the one I described in my previous post. For example, many use an algorithm closer to (or identical to) the one in Amazon’s Alexa. It’s not trying to have a conversation with you. It’s just trying to interpret what you’re asking for. As described to me by the co-founder of Mainstay (formerly known as AdmitHub), a student who needs financial aid may express that need to the chatbot as “I need money.”
On the back end, the answer returned by a chatbot may be as simple as an automated FAQ or use some other AI/ML techniques to provide a more tailored and conversational experience. I’m not sure what Mainstay does at this point—the product evolves, as products do—but the company has conducted multiple randomized controlled trials (in collaboration with university customers like Georgia State) demonstrating that they can help students navigate the college enrollment process.
Then there’s conversational analysis. One example I like, partly because it integrates into real-time videoconferencing and partly because I know and trust the founder, is Riff Analytics. Riff plugs into video conferencing software and provides real-time and after-the-fact metrics like who is talking the most, who is interrupting, whose comments are getting the most affirmation, and so on. If you think about that last metric—who is getting the most affirmation—you can easily see that multiple AI/ML techniques must be in play. First, the speech has to be converted to text. Then it must be analyzed for sentiment. (When I say “must,” that means I’m making an educated guess.)
From here, it would be easy to go down the rabbit hole and talk about, for example, how my friends at Discourse Analytics are improving nudges. AI/ML techniques in EdTech are far more pervasive, varied, and rapidly developing than is visible on the surface.
And that’s fine. These are tools. They are not automatically good or bad. The trick is in how you use them. Here’s my advice:
- Carrying over a point from my previous blog post, don’t assume your intuitions about your AI/ML application that you gleaned from early results will prove accurate once you try to scale. This tech can be deceiving.
- Think hard about innovative ways to include humans in the loop. These new technologies bring with them the requirement and the opportunity to radically rethink how we work together, including how we work on learning design together.
- Don’t assume that data from your product’s analytics satisfies the high standard of proof we should require before putting new technologies in front of learners. Follow the research and, when necessary or possible, conduct your own. Once again, this is an opportunity for us to innovate in how we work together.