Despite the eye-catching claim that large AI language models like ChatGPT have achieved theory of mind, some experts find their abilities lackluster.
By Cody Cottier
When you converse with the latest chatbots, it’s easy to feel like they get you. Their deft responses often give an undeniable impression that they’re aware not only of what you say, but of what you think — what your words imply about your mental state.
Theory of Mind
Among psychologists, there’s a term for that: theory of mind. This hallmark of social intelligence allows us to infer the inner reality of another person’s mind based on their speech and behavior, as well as our own knowledge of human nature. It’s the intuitive logic that tells you Ding Liren felt elated, not melancholic, after winning the World Chess Championship this month. It’s also an essential ingredient for moral judgment and self-consciousness.
In February, Stanford psychologist Michal Kosinski made the stunning claim that theory of mind had emerged spontaneously in recent generations of large language models like ChatGPT, neural networks which have been trained on enormous amounts of text until they can generate convincingly human sentences.
“If it were true,” says Tomer Ullman, a cognitive scientist at Harvard, “it would be a watershed moment.” But in the months since, Ullman and other AI researchers say they’ve confounded those same language models with questions a child could answer, revealing how quickly their understanding crumbles.
AI and Theory of Mind
Kosinski subjected various language models to a set of psychological tests designed to gauge a person’s ability to attribute false beliefs to other people. The Sally-Anne scenario, first used in 1985 to measure theory of mind in autistic children, is a classic example: One girl, Sally, hides a marble in a basket, and leaves the room; another girl, Anne, then moves the marble to a box. Where will Sally look for the marble?
Anyone without a developmental disorder recognizes that Sally’s model of reality is now amiss — she expects to find the marble where she left it, not where we omniscient observers know it to be.
Machines, on the other hand, have historically performed poorly on these tasks. But Kosinski found that, when confronted with 40 unique Sally-Anne scenarios, GPT-3.5 (which powers ChatGPT) accurately predicted false beliefs 9 times out of 10, on par with a 7-year-old child. GPT-4, released in March, did even better.
That seemed like compelling evidence that language models have attained theory of mind, an exciting prospect as they become increasingly entwined in our lives. “The ability to impute the mental state of others would greatly improve AI’s ability to interact and communicate with humans (and each other),” Kosinski writes.
Why AI Language Models Are Easily Tricked
Since his announcement, however, similar trials have yielded less dramatic results. Ullman presented language models with the same suite of tasks, this time adding slight adjustments, or “perturbations.” Such tweaks shouldn’t faze an entity with genuine theory of mind, yet they left even the strongest AI models disoriented.