AI/ML as Copilot

I will stick with the topic of artificial intelligence and machine learning (AI/ML) for today’s post because I keep getting feedback suggesting there’s a lot of interest in it. Since I’m on a bit of a streak, I feel compelled to make some caveats before jumping in.

First, I am not a software engineer or an AI/ML expert. In fact, I’m not an expert in many of the topics I write about. I just happen to be pretty good at making inferences from a small amount of information and understanding. I write about what I’m learning rather than what I know. For many years, the tagline for this blog was “What I’m learning about online learning.” ((“What I’m learning about stuff that interests me and is in some way relevant to digitally-enabled learning” seemed too long.)) Since this blog is about learning rather than knowing, the corollary is that I invite you to educate me if you know something I don’t. Please tell me if I’m wrong, if I’m missing something, or even if you think I’m right.

Second, I’ll be writing again about the big new AI language models that have been taking the world by storm lately. On the one hand, I risk adding to misperceptions with this focus. A lot of folks are writing about these models now because they’re so sexy, surprising and, frankly, potentially dangerous. In reality, AI/ML is a large, diverse, and ever-growing family of computational techniques, many of which have very different characteristics from each other. On the other hand, the big models are useful to write about precisely because they are on the extreme end of the spectrum regarding their alienness. They highlight some problems that may be more subtle and harder to see in other techniques.

My last caveat is that I will be responding to an interview conducted by the great Ben Thompson of Stratechery. Specifically, I’ll be quoting from one of his subscription-only articles. Since this is his bread and butter, I’m mindful of putting too much of his paid content on the public internet. Luckily, it’s a very long article and I’ll only be quoting a small fraction of it. While I think it likely meets the criteria of fair use, more importantly, I’m hoping Ben will see it as an advertisement. I’m a fan. I don’t pay for many newsletters. I do pay for his. If you want to understand the intersection of tech and business, you should too. He’s a fantastic writer and an original thinker.

This post happens to be an interview, which is unusual for Stratechery. The interviewees are Daniel Gross and Nat Friedman. Ben identifies them first as VCs, but their salient credentials are that they both worked extensively in tech and are real experts in AI/ML. His post is fascinating in its entirety. I’m going to focus on a few aspects that are salient for EdTech and that resonate with my recent screeds on AI/ML.

Spooky and kooky

In my post about using GPT-3 to create a philosophy tutor chatbot, I wrote about the miracle, the grind, and the wall. First, these models do something that blows your mind. That gets you excited. As you try to turn that moment of exhilaration into a reliable and scalable piece of software, you discover that reliable and scalable are both arduous work. Eventually, you hit a wall you can’t get past. And it’s hard to predict in advance where that wall will be.

Nat Friedman oversaw the creation of GitHub Copilot, which uses a version of GPT-3 to suggest code to developers in real-time as they are writing software. Here’s what he said about what it was like:

he thing I would always say with those models is that they alternate between spooky and kooky. So half the time or some fraction of the time, they’re so good, it’s spooky like, “How did it figure that out? It’s incredible. It’s reading my mind,” or “It knows this code better than I do.” Then sometimes it’s kooky, it’s just so wrong, it’s nonsense, it’s ridiculous. So when it was wrong, it was really wrong. It turned out from testing it in the Q&A scenario that when you actually asked the thing a question and it gave you more often than not a wrong answer, you got very irritated by it — this was an extremely bad interaction. So we knew that it couldn’t be some explicit Q&A interaction. It couldn’t be something where you ask a question and then 70 percent of the time you get a useless answer. It had to be some product where it was serving you suggestions when it has high confidence, but it wasn’t something you were asking for and then getting disappointed by….

[I]t turns out in retrospect, we know this now and we didn’t know it at the time, the question that we were trying to answer was, “How do you take a model which is actually pretty frequently wrong and still make that useful”? So you need to develop a UI which allows the user to get a sense and intuition themselves for when to pay attention to the suggestions and when not to, and to be able to automatically notice, “Oh, this is probably good. I’m writing boilerplate code,” or “I don’t know this API very well. It probably knows it better than I do,” and to just ignore it the rest of the time.
Stratechery

Friedman’s comments highlight that even Microsoft, using one of the most advanced AI models on the planet, could only get a useful answer from the AI about 30% of the time after carefully training it on the vast body of software code in GitHub that had been tested and validated as working.

There isn’t even a moment’s consideration given to having the model replace the programmer. It’s wrong 70% of the time. That might not be true always and forever but it’s true now with an army of skilled engineers using one of the best models available. The product is called Copilot. Not only is there a human in the loop; the human is in charge. A lot of thought went into designing the software so that expert humans will feel comfortable ignoring it and not annoyed that it’s wrong so often:

So it’s funny because a lot of the ideas we had about AI previously were this idea of dialogue. The AI is this agent on the other side of the table, you’re thinking about the task you want to do, you’re formulating it into a question, you’re asking, and you’re getting a response, you’re in dialogue with it. The Copilot idea is the opposite. There’s a little robot sitting on your shoulder, you’re on the same side of the table, you’re looking at the same thing, and when it can it’s trying to help out automatically. That turned out to be the right user interface….

So from the June realization that we should do something, I think it was end of summer, maybe early-September by the time we concluded chatbots weren’t it. Then it really wasn’t until February of the next year that we had the head exploding moment when we realized this is a product, this is exactly how it should work…. So now, it’s very obvious. It seems like the most obvious product and a way to build, but at the time, lots of smart people were wandering in the dark looking for the answer.
Stratechery

This is not the way a lot of EdTech is designed. The industry tends to develop models that automate various aspects of course design, teaching, or support and shoot it straight to the students without much thought about how those students will handle or even recognize, errors. Even when there’s a human in the loop, we’re not typically designing these products for expert humans. Instead, we’re using the humans to review—which might mean spot check—the machine-generated output or tool, which is intended for use by explicitly non-expert humans.

“A Robot Tutor in the Sky,” by the DALL-E 2 algorithm

The best systems using tech that is extremely useful but highly unreliable are designed to make the role of human judgment obvious to see and easy to apply. As Friedman notes, getting this right is very hard.

Getting the interaction model right

Thompson’s response to Friedman was characteristically insightful:

The reason is that people want to anthropomorphize everything and they want to put everything in human terms. The whole point of a computer is it just operates utterly and completely different than humans do. At the end of the day, it’s still calculating ones and zeros. So everything has to be distilled to that and it just does it at tremendously fast speed, unimaginable speed, but that is so completely different than the way that a human mind works that that’s how whatever was kooky or spooky I’m sure was completely and utterly logical to the computer. It strikes me that this is why the chat interface was wrong, because what it was doing was it was taking this intelligence, and it was actually accentuating the extent to which it was different than humans by trying to put it as a human, as if you’re talking to someone, and it was actually essential to come up with a completely different interface that acknowledged and celebrated the fact that this intelligence actually functions completely and utterly differently than humans do.
Stratechery

The three lessons here are (1) make the AI visible, (2) make it clear that the AI is not some simulacrum of human intelligence but rather is a tool that works imperfectly, and (3) create an experience that encourages users to exercise judgment when evaluating the output of the AI.

Here’s Friedman again about Copilot:

The thing that Copilot gave us that we, again, only realized in retrospect was this randomized psychological reward. It’s like a slot machine where the ongoing cost of using it at any given moment is not very high, but then periodically you hit this jackpot where it generates a whole function for you and you’re utterly delighted. You can’t believe it, it just saved you 25 minutes of Googling and Stack Overflow and testing. That happens at random intervals, so you’re ready for the next randomized reward, it has this addictive quality as a result. Whereas people frequently have ideas that are like, “Oh, the agent is going to write a huge pull request for you and it’s going to write a huge set of changes across your code, you’re going to review that.”
Stratechery

Copilot turns a bug into a feature. The unreliability leaves you delighted when the product gives you something useful rather than angry when it doesn’t. This only works because the user explicitly understands that the product is 70% unreliable but finds it helpful—and delightful—anyway.

Again, that’s not how EdTech typically is designed and it’s certainly not how it’s usually sold. We often present the tech to non-expert users as a virtual helper that’s positioned as a tutor or advisor.

It doesn’t have to be that way. First, we should spend more energy in EdTech using AI/ML to help expert users—i.e., the educators—more efficiently and effectively leverage their expertise to help the non-expert users—i.e., the students. Second, we can create interfaces for non-expert users that turn the unreliability of AI models from a bug into a feature. One portion of the interview turns to new models that enable users to generate novel images from text. Friedman talks about people using one such model, called Midjourney, that they access through a discussion forum called Discord:

if you ever watch a really creative person sit over their shoulder and watch them use Midjourney for an hour, you find that what they’re doing is not one text-to-image. They’re writing a prompt, they’re generating a bunch of images, they’re generating variance of those images, they’re remixing ideas, they might be riffing off someone else in the Discord channel, they’re exploring a space, you’re exploring latent space in a way, and then pinning the elements of it that you like, and you’re using that for creativity and ideas, but also to zero in on an artifact or an output that you’re trying to produce, and those are different modes.
Stratechery

This was similar to my experience trying to write a philosophy tutor. It was creative and fun. It felt a bit like teaching a student with strengths and limitations that I was trying to understand. I would probe, adjust, and probe again. I loved it. And when I had reached my limit with the chatbot, yes, I was tempted to throw it out and start fresh with another idea.

Friedman, noting the popularity of Midjourney on Discord, described an example user he heard about from the company’s founder:

There was one David was telling me about recently who’s a trucker, who when he stops at truck stops, he canceled his Netflix, and now what he does is he just makes images for a couple hours before bed, and he’s utterly transfixed by this. To me, that seems like it’s just objectively better than watching Netflix and binging a show; it’s exploring the space of your own ideas and creativity and seeing them fed back to you. So it turns out there’s a lot of people who have this creative impulse and just didn’t have the tools, the manual skills to express it and to create art, and something like Midjourney or something like Stable Diffusion gives them that, and that’s incredibly exciting.
Stratechery

In education, we discuss the need to expose our students to AI/ML and develop some literacy. This is equally true in the workforce, by the way. And yet, when we do expose students to the tech, make it as we tend to make it invisible as possible.

The bottom line

Some readers have interpreted me as an AI/ML skeptic, at least as applied to EdTech. The opposite is true. I’m enthusiastic about its potential and some of the current uses I’ve seen. I simply believe that we are immature in our thinking about how to apply it constructively to our domain. While we’re hardly alone in that, we have an extra responsibility of care as educators. We should be looking for new models that previously weren’t possible rather than just trying to automate and accelerate old models. In doing so, we should embrace the limitations of the technology and let them inspire fresh thinking.

Comments

Eric Likness says

October 7, 2022 at 11:33 AM

Great timing as I just bumpbed into this article today on the UK site: The Register: https://www.theregister.com/2022/10/07/machine_learning_code_assistance/.

NYU did a study of AI/ML coding assistants and concluded, statistically, “no significant difference” which for their study was the right answer. They were worried the Co-Pilot assistance would introduce worse code, buggy, insecure code. But in fact it was about equal with or without Co-Pilot assistance. So at minimum with the small sample they had, the Co-Pilot wasn’t worse than not having Co-Pilot. And in your description today from Stratechery, sounds like when the UI is designed correctly and expectations scoped appropriately (70% incorrect suggestions, and 30% correct), you have something that might further evolve into a true asset.

Thanks for sharing your quotes out of the news letter. Much appreciated.

Spooky and kooky

Getting the interaction model right

The bottom line

Disclaimer

Reader Interactions

Comments