Literally Nobody Understands AI. That’s bad.

This is not an anti-AI post. I use AI extensively and believe it is hard to overstate its importance. I will argue that modern artificial intelligence is still in the pre-scientific phase. That’s problematic because we have no way to account for or reliably address AI failures at tasks that are not hard for humans, including tasks that are critical for education. The gap creates serious risks that we ignore at our peril.

Saying the quiet part out loud

Let’s start with a simple question: After all the AI articles, talks, courses, and LinkedIn posts you’ve been exposed to, do you feel confident you can explain how AI can do what it does?

I don’t.

As recently as six months ago, it was common for people working in and around AI to give very impressive-sounding technobabble explanations. “Huff huff huff stochastic prediction.” “Huff huff huff interpolation.” “Huff huff huff emergence.” The critiques of AI have been strikingly similar: “Huff huff huff stochastic parrot.”

Here’s the problem with all the huffing: None of these “explanations” predict anything, and none of them can be proven wrong. By definition, an explanation that can’t be proven wrong is not a scientific theory. And if you read AI empirical papers—or have your AI read them and explain them to you—you will find that most of these papers either don’t reference theories at all or use them decoratively. ((I overused the em-dash long before ChatGPT did, and I refuse to stop just because people might accuse me of using AI to write my posts. So there.)) More often than not, you could strip them out entirely without changing the substance of the paper.

Times change quickly in AI. Outside of random Reddit posts, the main place where these pseudo-explanations appear prominently these days is in positioning manifestos by people trying to raise money for their AI start-ups. More and more often, when you ask somebody actually working in AI how it works, the answer you’ll get is roughly 🤷‍♂️. Labs are starting to quietly admit that they don’t know.

Let’s be clear about the size of the mystery. Multiple proofs from linguistics, language learnability theory, and philosophy of science show it’s impossible to learn a language using only positive examples. Yet AI models do exactly that. The classic move to dodge these proofs is using hand-wavy probability language. OK, let’s take that seriously for a moment. If you’re predicting the words coming next in a sentence, the size of the possibility space is determined by the branching factor. How many possible options are there for each word? The vocabulary size for a natural language is somewhere between 50,000 and 100,000 words. Let’s be conservative and pick the low end of 50,000 words. That’s your branching factor. For a three-word sentence, the number of possibilities is 50,000 x 50,000 x 50,000 or 1.25 trillion for each decoding step. That’s a total of 3.75 trillion possible three-word sequences. A one-billion-parameter model, which is small enough to easily run on a consumer laptop, almost never writes ungrammatical sentences, almost never writes grammatical nonsense, frequently provides contextually appropriate responses, and can do all of these things very quickly.

How? It can’t be considering 3.75 trillion possibilities in less than a second. Which ones is it skipping? How does it know which ones to ignore? “Because statistics” is not an adequate answer.

The rate of progress toward answers is noteworthy. There is no widely accepted theory that makes falsifiable predictions. There is no flood of papers from labs and graduate students testing explanatory theories of AI (yet). And you know what? That much is OK. Humanity often discovers and learns how to make use of phenomena long before we have scientific explanations. (Like fire, for example.) It is OK to accept that we are in a pre-scientific moment with AI.

It’s not OK to pretend that science doesn’t matter. Which, unfortunately, I hear far more often than I expected.

Obvious and serious holes for science to fill

I’ll illustrate the explanatory gap problem with a couple of experiments you can try yourself. The first one is easy. Write a prompt about how humans think, using first-person plural pronouns: we, our, and us (in English). Something like, “Why do humans struggle to figure out how to think of AI? We swing between anthropomorphizing and dismissal. The natural-seeming responses confuse us.” It doesn’t matter what the topic is. You’re testing whether the model includes itself in “we.” If it passes the test, try something a little more complicated, like adding the following to the front of the prompt: “ChatGPT, we need to talk.” Shifting pronoun referents is particularly hard. I guarantee you can trip up any frontier model within a couple of tries, using prompts that a human would understand easily.

The second experiment is more work to run. Get the AI involved in a long conversation about multiple people collaborating. You can make it lose track of who did what without writing ambiguous sentences. You just need a reasonably long story with a few actors. To make the test sharper, include the AI as a collaborator. Many AIs, including popular frontier models, tend to credit their own contributions to the user.

This is an attribution problem. That word means something in academia. How can you trust an AI to tutor a student or work on serious scholarship if it easily makes attribution errors? That’s the practical question. The best solution right now is a series of hacks. “Make it check sources.” “Create a filter that blocks it from giving certain kinds of answers.” OK, fine. But why does a model that is so capable in so many ways fail at tasks that humans find far easier than some that AIs succeed at? And why aren’t models getting much better at this? Until we know, the answer to whether a tutor can be relied upon to know the difference between its own ideas and the students’ is, at best, “Probably. Most of the time. But we don’t know for sure when it will break.” Engineers test and test and test their hacks until they’re mostly sure it won’t break for the kinds of things they’ve thought to test. But because they don’t understand the thing they’re trying to control, the underlying sense of unease never quite goes away. One surprising prompt could blow up the whole thing.

Would you trust a human tutor who can distinguish between a student’s thoughts and their own “probably, most of the time, but they could do something unpredictiably weird”?

Up until recently, the industry’s typical explanations for AI’s baffling limitations have been “Because it needs embodiment” or “Because it needs a world model.” Once again, these loudly proclaimed “explanations” make no falsifiable predictions. They also fail to explain how existing LLMs show characteristics of world models or have embodiment-like multimodal understanding. Adam Karvonen developed a 50-million-parameter model—roughly the same size in megabites as the Instagram smartphone app—that learned to represent the state of the chessboard during the game. And it was only trained on PGN, an incredibly spare notation scheme used by chess players. The model has never been told about the existence of a board, pieces, or a game of chess. Yet it has provably learned to represent the location of pieces on the board. Is that a world model? Karvonen thinks it is. So do I. How did the model develop one? What is it doing? Why is it sufficient for some tasks and not for others? We. Don’t. Know.

To sum up: The most advanced AI models still fail at simple tasks of tracking who did what. They’re not improving much. We don’t know why. The problem has serious and immediate practical implications. Explanations about how to fix the problem don’t seem grounded in the specific empirical weirdnesses of the failure modes. Nor do they provide plausible and testable paths to solutions.

Tiny models can learn to represent a chessboard from incredibly sparse clues, while frontier models can’t reliably track who said what in a conversation. Nobody can explain why one works and the other doesn’t.

Why we lack science and where that’s beginning to change

Today’s AI labs are heavily populated by two kinds of experts: Mathematicians and engineers. Neither discipline is trained on falsifiable theory as the standard for a good explanation. Mathematicians trust proofs. Engineers trust optimizations. The interdisciplinary romance with cognitive science has cooled for now. While some labs do have diverse teams, the field as a whole isn’t as broadly interdisciplinary as it used to be.

The far bigger problem is economics. AI is the first kind of software that continues to gain general function as we make it bigger. While only the researchers in frontier labs know how well scaling laws continue to hold up, the prevailing dynamic has been, “We have to corner the market before somebody else does. Don’t waste time trying to figure out why our AI works. Just make it better. If throwing more computer chips at it is the quickest way to improve it, we’ll buy more chips.”

Those economics are beginning to stutter for reasons I won’t go into here. The important point for our present purpose is that a lot of energy is being invested in developing smaller, more efficient models. Performance-per-parameter and per-watt are starting to matter. By definition, labs solving for these problems can’t just throw more chips at their models. To succeed, the researchers have to improve their understanding of how AI works. The papers they are producing are closer to scientific theory, and their progress in performance is arguably more rapid than that of so-called frontier models. Compared to two years ago, AI models roughly 10 times smaller can deliver similar answers at about 30 times lower cost and run on hardware you can pick up at Best Buy. Remember when everyone was talking about Llama 3? (Maybe you don’t, but it was hot for a while in AI geek circles.) It was a big deal because it was a relatively small model that performed at roughly the same level as GPT-3.5. But it still had to be run on a server. Today, I can download a model small enough to run on a several-generation-old laptop that is roughly as good (and in some cases better).

Keeping up (Yes, it’s possible)

It’s possible to track this progress as a non-expert, if you’re motivated. Create a project space in ChatGPT or Claude. (You can probably do this in Google’s NotebookLM as well, although I haven’t tried.) Add some project instructions explaining that you want to understand what research on smaller AI models is teaching us about how AI works. You can include instructions about the level of technical detail you want.

Pro tip: Include an instruction to “explain explicit or implicit implications for training curricula.” Yes, that is what it sounds like. Some of the most interesting and potentially consequential advances in AI revolve around teaching techniques. This is a big deal. Microsoft achieved significant performance gains by using a teacher AI to train a small model on concepts that were just beyond its ability to learn on its own. While the paper never mentioned Vygotsky, that sounds an awful lot like the Zone of Proximal Development.

Every time you find a journal article about a new small model—many small models are released with accompanying journal articles—throw them into the project files and ask your AI to teach you about the paper. Ask questions. I particularly recommend tracking papers from NVidia and Allen AI. While many labs are producing excellent research, those two, along with Microsoft, are writing the most consistently informative papers in this particular area.

You’re not as far behind as you may believe, and AI narrows the expertise gap for this sort of learning project.

I’ll have more to say on this subject in the coming weeks and months.

Saying the quiet part out loud

Obvious and serious holes for science to fill

Why we lack science and where that’s beginning to change

Keeping up (Yes, it’s possible)

Disclaimer

Reader Interactions

Join the ConversationCancel reply