I had a conversation with a friend last night about the counter-intuitive challenges of working with AI/ML and the implications for using them in EdTech. I decided it might be useful to share my thinking more broadly in a blog post.
Essentially, I see three stages in working with artificial intelligence and machine learning (AI/ML). I call them the miracle, the grind, and the wall. These stages can have implications for both how we can get seduced by these technologies and how we can get bitten by them. The ethical implications are important.
The Miracle
One challenge with AI/ML is how deceptively easy it is to produce mind-blowing demos with these tools. For example, I spent some time playing with GPT-3 as a learning exercise. GPT-3 is one of several gigantic AI models that can do some pretty miraculous things with natural language. (Google has the other prominent gigantic model.) One reason I started with GPT-3 is that it can be programmed using natural language. For example, you can tell it, “You are a helpful chatbot that teaches first-year college students about philosophy” and voila! You have a helpful philosophy-teaching chatbot.
It’s not quite that simple. For example, I found that GPT-3’s idea of what a first-year college student understands about philosophy differed from mine. I got better results when I asked it to target 11th grade. I couldn’t have known that in advance. GPT-3 is a neural network of 175 billion parameters. It has indexed large swathes of the internet and many books. But it doesn’t store all that information, exactly. It distills it in very complex ways. In fact, GPT-3 and similar models are so complex that even the programmers who made them can’t explain why they produce specific responses to instructions or questions. So I had to figure out how to “program” my chatbot through a bit of trial and error.
Not all AI/ML algorithms are this complex. Some of them are much easier to understand. It’s a spectrum, and GPT-3 is on the far end of that spectrum.
Anyway, after a few days of intermittent tinkering, I was able to produce a chatbot that could carry out a sustained and informative conversation about David Hume’s theory of epistemology, to the point where it gave me new insights into the subject. I accomplished this by tinkering over a few days, as a layperson, using plain English.
I had reached the miracle stage of AI/ML.
But there were problems. First of all, the chatbot would end the conversation just when it got really interesting. It suddenly would insist on saying goodbye and could not be persuaded to continue talking. It turns out that this kind of AI model has a strict memory limitation. When you hit it, the chatbot suddenly forgets your entire conversation. My new philosophy tutor friend also sometimes gave weird answers. I knew when to ignore them but they would have confused some students.
GPT-3 has a community of developers who are incredibly helpful, particularly when the topic is something idealistic like education. I was able to find a very knowledgeable programmer who was generous with his time and helped me understand what I would need to do in order to take my tutor to the next level.
The grind
The first thing I’d need to do is learn to program in Python since I had reached the limits of programming in plain English. And for the 9,781st time, I was momentarily tempted to learn a little programming. But then he explained what I’d need to do.
For the memory problem, I’d need to chain portions of the conversation together. But since GPT-3 doesn’t actually remember large chunks of information so much as it distills them, the chatbot wouldn’t literally be able to recall our entire conversation. You can quickly see where this could become problematic. If the student says something like, “When you said earlier that…”, it’s hard to predict how the chatbot would respond.
And so we enter the grind phase. It could also be called the whack-a-mole phase, since you’re finding a problem, writing a solution, and then looking for unintended consequences elsewhere. Also, since the model isn’t knowably deterministic and the questions students will ask also aren’t knowably deterministic, it’s probably impossible to test all the possible scenarios.
Which is why you don’t see chatbots that are this open-ended and ambitious. Translating that initial miracle into a reliable response is a daunting if not impossible task. Today’s chatbot designers use UX and context tricks to make the inputs from the students more predictable and they also use less complex algorithms with outputs that they can predict and debug more easily. They tend to reserve the usage of models like GPT-3 for limited and specific applications. And even when they’re careful, producing a chatbot that is rock-solid reliable takes a lot of hard work, including difficult debugging that’s often quite different from debugging traditional software.
This brings us to the final phase: The wall.
The wall
Sooner or later you reach the limit of what your tech can do for you. Predicting that limit in advance takes tremendous skill, often requiring extensive domain knowledge of both the tech itself and the problem it’s being applied to, whether that’s detecting manufacturing defects, discovering new drugs, or tutoring students. There are always nooks and crannies of knowledge and skill that are a poor match for the technology’s capabilities or the data it can access.
Think about spelling and grammar checkers. They’ve been around since 1961, believe it or not. Even as recently as five or six years ago, Microsoft Word’s spell checker was so bad that I always turned it off. Today, I use Grammarly Pro, which checks spelling, and grammar, and now even makes suggestions on effective sentence structures. I love it. It makes me a better writer.
But it still makes mistakes in spelling. It makes more mistakes in grammar. And it’s writing style suggestions, while pretty good, make the sentence worse or even change it’s meaning fairly often. The reasons for these limitations are often not obvious to the layperson. For example, Wikipedia notes this eye-opening fact about spell checkers:
It might seem logical that where spell-checking dictionaries are concerned, “the bigger, the better,” so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked.
Wikipedia
The tech has a non-obvious fundamental limitation. And in this case, not only is more data not better; more data is worse. So the idea that all AI/ML problems can be fixed with big data is flat-out false. Sometimes better data is better.
Grammar is significantly more complex than spelling and writing for clarity is significantly more complex than grammar. Each of these functions will likely hit a wall. It might not be a permanent wall, since technology improves over time. But it might be, since sometimes the limitation isn’t the tech but the nature of the problem or the data available in a form that is accessible to the tech.
Ethical implications
The rush I felt when I learned something about philosophy from the chatbot I wrote myself is indescribable. I was a philosophy major with a particular interest in anything related to the mind or knowledge. While I don’t have an advanced degree, I certainly knew something about David Hume’s epistemology when I started the dialogue. I was certain I was seeing the future.
And maybe I was. But it isn’t the near future. When I think about the much more mature technology of the grammar checker, I wouldn’t trust it with weak writers, and certainly not with ESL students. The checker would be more prone to make mistakes and the students would be less likely, on average, to have the confidence and knowledge necessary to know when to ignore the machine. In order for me to change my mind, I’d want to see some quality IRB-approved, peer-reviewed studies showing that grammar checkers help these students rather than harm them.
We’re in a heady moment with AI/ML. I see a lot of projects rushing headlong into heavy use of the tech, often putting it into production with students without the kind of careful oversight necessary to fulfill the EdTech Hippocratic oath: First, do no harm.
Fred M Beshears says
Great post, Michael.
For now, I just have a reference on AI/ML in education. I believe the report can be freely downloaded as a PDF.
You probably have this, but your readers may not.
—————————————————-
AI and the Future of Learning: Expert Panel Report
by Jeremy Roschelle, James Lester, Judi Fusco (Editors)
16 November 2020
https://circls.org/reports/ai-report
“””
Artificial intelligence (AI), machine learning, and related computational techniques have the potential to make powerful impacts on the future of learning. Technology’s impact on education is often to amplify impacts, regardless of whether the impacts are intended. Due to the accelerating pace of integration of technology in learning environments, the knob on the amplifier is rapidly going from low to high. Impacts on learning, whether positive or negative, could soon have consequences for many more students. Now is the time to begin planning for how to best develop and use AI in education in ways that are equitable, ethical, and effective and to mitigate weaknesses, risks, and potential harm.
We convened a panel of 22 experts in AI and in learning to address these issues. They met online for seven hours over two days in a facilitated process with different topics and breakout formats. The experts considered two broad questions:
1. What will educational leaders need to know about AI in support of student learning in order to have a stronger voice in the future of learning, to plan for the future, and to make informed decisions?
2. What do researchers need to tackle beyond the ordinary to generate the knowledge and information necessary for shaping AI in learning for the good?
“””
For more, follow the link above.
Sal Gerardo says
Michael, Interesting article!! If you would like to hear about a intelligent educational conversational tutor without the issues you discussed, please read my article in ET Magazine. Here is the link: https://digital.et-mag.com/issue/volume-1-issue-1/
Regards, Sal Gerardo