From Shannon’s prediction game to modern transformers—and what gets lost in meaning
The Mysteries of Words
A few years ago, I sat in a dissertation defense for a student working on language models, and something quietly unsettled me. He was explaining “attention mechanisms,” showing how his system could summarize documents in perfect English. I remember watching the examples scroll by, line after line, unlike earlier machine-generated texts, carrying no seams.
After the defense, there was a small celebration. Over coffee and cake, I asked colleagues where this whole idea had begun. Someone mentioned ELIZA: the first chatbot that simulated conversation. ELIZA’s famous DOCTOR script mimicked a psychotherapist, reflecting users’ words back at them and so creating the illusion of understanding.
It didn’t have a model of meaning; it was based on pattern-matching and substitution. And yet many people responded as if it did. That strange human tendency to attribute feeling to a simple program was later called the ELIZA effect. Curious, I traced the thread further back and realized that ELIZA stood on a foundation laid much earlier by Alan Turing. Turing had in fact written two remarkable papers just after World War II.
In 1948, in Intelligent Machinery, he sketched what he called an “unorganized machine,” a crude model of the infant brain. Its wiring was initially random, but through reward, punishment, and random exploration, it could be “educated” into a universal computer. To prove its worth, he proposed a set of benchmark tasks: games like chess and cryptography as easier challenges, mathematics and translation as harder ones, and at the very top, the learning of natural language. Already, he sensed that teaching a machine language would be the ultimate test.
Two years later, in 1950, in Computing Machinery and Intelligence, he posed his famous provocation: not “Can machines think?” but whether they could succeed in the imitation game, what we now call the Turing Test. Once again, language was the centerpiece. He imagined training a machine as one might teach a child, through interaction, naming, and correction.
In hindsight, ELIZA was almost a toy realization of Turing’s challenge: a machine producing conversation that, for some, passed as human. It was a small step, but a definitive one on the march toward what we now call artificial intelligence. That was the lineage: from Turing, through ELIZA, to the models I had just seen.
The resemblance was striking, but also disorienting. It blurred the boundary between words that merely followed rules and words that carried meaning. The harder task was never finding the words, but shaping the essence they had to carry. Unlike the demo I saw, which simply strung together the most probable words, I faced a different kind of struggle: I knew the words, but the felt essence wasn’t there yet. That quiet, internal wrestling, the pause and reshaping before meaning lands, was absent in the polished texts before me.
This is not merely a technical matter of algorithms. Language is bound up with social practice, with the rituals and repositories by which communities make sense of the world. A sentence does more than map signs to ideas; it declares, commands, consoles, jokes, curses, remembers, and binds people together in common projects. Across cultures and centuries, in oral performance and written text, language accumulates authority and history: it carries law, theology, love letters, treaties, protests, prayers, jokes, and recipes.
I grew up with the Ramayana and the Mahabharata, not as entertainment but as living traditions: vast conversations full of moral dilemmas, strategic debates, and philosophical inquiry. The Bhagavad Gita, nested inside the Mahabharata, is itself a dialogue about duty and liberation that still guides millions. Later, in English, I discovered Shakespeare’s soliloquies and the playful wit of Pygmalion. Across these forms, language is never merely syntax or sequence; it is a vessel through which thought, memory, and moral imagination endure.
But language is not the only such vessel. Music can convey emotion more directly than a sentence. A painting can express what words cannot. Athletes speak through rhythm and gesture; mathematicians through symbols and proof. These are all languages in a broader sense: systems of signs that carry meaning.
What sets natural language apart is its paradoxical combination of simplicity, fragility, and reach. Compared to music, it is a sparse medium. Just strings of symbols. And that simplicity makes it radically fragile: a comma, a tense, or a single word can tilt the whole meaning. Yet because it can describe, question, command, or console, its coverage is unmatched. Perhaps this is why Turing placed language at the pinnacle of machine intelligence: to master something so lean, so delicate, and so general would be to master almost anything.
That ambition, once only theoretical, suddenly felt realized when ChatGPT arrived. What had once been a curiosity in labs and conference rooms suddenly erupted into daily life. A seminar-room marvel became a household word. Classrooms, newsrooms, law firms, and living rooms all found themselves confronted by a machine that could speak in polished paragraphs. Students turned in assignments written by it, professionals drafted memos with it, and friends sent poems and prayers through it. Headlines proclaimed that human thought itself had been hacked.
The effect was electric, unsettling, and irresistible: a machine producing not just a clever reply but a torrent of fluent language on command: text that could mimic styles, explain theories, or console in grief. Where earlier chatbots exposed their seams within a few exchanges, this one could sustain entire conversations, sounding by turns confident, witty, or wise. For many, it felt like crossing a threshold: as if the capture of language, the vessel of our thought, memory, and imagination, had also led to the capture of thought itself.
The story of how diffusion models create—and what they leave behind
I’m excited to share my new piece, How AI Generates: From Noise to Form.
Diffusion models are behind much of today’s generative AI—image synthesis, text-to-art, even video. But how do they actually work? And what are their limits?
My goal in this essay is to give readers a way in—to make the core ideas approachable while still keeping the essentials intact. Along the way, I walk through:
Why diffusion models corrupt data step by step, then learn to reverse the damage
How generation reduces to a series of prediction tasks
Why prompts guide but cannot fully constrain creations
And what glitches like the six-fingered hand reveal about the gap between surface and meaning
I start with a scene from the US Open to set the stage, but the heart of the piece is about how these models generate—and what they leave behind.
When intelligence turns inward, it creates the structure in which suffering can arise
When Light Meets Mind
In 1930, Berlin, two men sat across from each other. I’ve read the transcript more times than I can count. Not because it resolved anything, but because it kept echoing through my life.
You had Albert Einstein: the rationalist, the physicist. The man who redefined time, and connected matter and energy, gravity and spacetime. He saw the universe as vast, governed by elegant equations: something to be uncovered, mapped, and trusted.
And then Rabindranath Tagore: the poet, the mystic, the one who saw the inner world. But more than that, he believed that consciousness was not an afterthought in the cosmos. It was central. He wrote songs that became anthems, plays that folded myth into philosophy. His essays held reason and reverence in the same hand.
Where Einstein searched for equations and invariants, Tagore searched for meaning. He accepted science; he insisted that without awareness, it was incomplete. That the world, however beautiful, is only illuminated through the light of mind.
They spoke quietly, these two Nobel laureates, gently trying to name what is real.
Einstein put it plainly: “I believe in the external world, independent of the perceiving subject.”
Tagore’s response was just as clear: “The world is a human world; its reality is relative to our consciousness.”
On the surface, it sounded like a classic philosophical disagreement. But it felt like something more. As if two orientations, objective structure and lived experience, had paused long enough to listen, without declaring a winner.
I used to read that conversation purely as a question of truth: Is there a world independent of us, or is the world, at its core, ours?
But over time, it stopped feeling abstract. It began to weigh on me, in quiet ways I couldn’t always name.
There were losses, some that shook me more than I expected. And then the birth of my children cracked open a terrain I had not known was missing. Both experiences, in their own way, made me question how I had come to know the world, and what kind of knowledge mattered.
The structure I had spent years building, through science and mathematics, was solid. It gave me clarity. It gave me recognition. But it stopped short of certain truths I could now feel pressing in from the edges. Truths not about the world, but about the self that was trying to understand it.
And I began to notice something unsettling: the same intelligence that helped me understand the world could also make me feel lost within it. Not in the usual scientific way, where each answer opens new questions. That rhythm was familiar, even comforting. This was something different. A quieter unease. As if my way of knowing: analytical, recursive, and precise, had become part of the very trap.
I began to see a pattern: the mind’s ability to turn inward, to reflect on itself, could both elevate and entangle it. That reflection could open not just insight, but ache.
Tagore’s words, “The world is a human world”, started to echo differently. Not just as a metaphysical claim, but as a lived one. As a statement about what happens when consciousness turns in on itself. When intelligence doesn’t just observe the world, but begins to simulate its role within it. To model. To track. To reflect. And so I found myself asking: If intelligence brings light to the world, what happens when that light bends inward? What does it illuminate? And what does it burn?
This essay follows that question: tracing how recursive self-modeling, across biology, culture, and computation, opens the door not only to creativity and empathy, but to suffering. Along the way, I turn to contemplative traditions and computational framing to propose a deeper account: that suffering emerges when valuation becomes identity, when the mind tries to optimize a moving target that it has mistaken for itself.
What AI-optimized schools misunderstand about learning
It was just a question over breakfast.
“What’s a metaphor?” Mira asked her father, spoon halfway to her mouth.
He began to explain, but she interrupted: “So it’s when something isn’t what it is—but also is?”
There was a silence at the table—not confusion, but recognition. She had already touched it, before any definition arrived. Before a lesson plan or rubric could intervene.
That moment—so small, so ordinary—was also everything.
Because this is how real learning often arrives: sideways, unscheduled, alive. A flicker of attention. A question asked not because it’s required, but because something inside needs to know.
And I wonder: in the schools we are now building—will there still be room for that?
A computational anatomy of intelligence. How faculties interact, architectures diverge, and coherence emerges through self-constructed fictions
“There is no single path into the forest.” — Yoruba proverb
Yo-Yo Ma and the Single Note
It was the winter of 2018 and the NeurIPS conference—one of the world’s premier gatherings on artificial intelligence—had descended on a snow-laced Montreal. Thousands of researchers, engineers, and students crisscrossed the vast convention center, sharing ideas about optimization tricks, new models, and the future of AI. Posters lined the walls of rooms steeped in the aroma of coffee, while outside, the city lay wrapped in cold, crisp silence.
At one of the marquee panels, a senior executive from a major tech company presented their latest AI music generator—an advanced system trained on thousands of classical works, capable of composing coherent classical music in real time.
The melodies were elegant and the timing precise.
Then Yo-Yo Ma was invited to respond.
He didn’t speak. He turned his chair, lifted his cello, and played a single note. Then he played it again. And again. Each time, the same note emerged differently—tentative, bold, grieving, serene. Each time, his breath shifted and his eyes drifted into a different world.
The AI had captured form. But Yo-Yo Ma, infusing his music with intention and feeling, captured the room.
That moment didn’t just expose AI’s limitations. It revealed a deeper truth:
Intelligence isn’t precision—it’s relation.
It does not reside in outputs alone, but in how systems tune themselves to the world: shaped by context, memory, attention, and intent.
It is a dynamic interplay between perception and action, between internal models and external pressures. It arises wherever systems engage their constraints creatively: whether through mycelial networks, migrating birds, musical phrases, or planetary motion.
In the previous essay, we traced how intelligence emerges in nature: not as a fixed trait, but as a layered process—optimization in physics, adaptation in evolution, collective sensing in life before neurons.
This second essay turns inward—from emergence to architecture. If the first asked where intelligence comes from, this one asks: what is it made of?
We begin by identifying a set of core faculties: sensing, responding, memory, learning, attention, valuation, modeling, and reflection.
These faculties take many forms. Sensing may be chemical, tactile, social, or symbolic. Memory may be episodic, spatial, or associative. Valuation may be shaped by prediction error, pain, or narrative.
And how they are configured—what is emphasized, suppressed, amplified, or ignored—depends not just on design, but on history: evolutionary, developmental, experiential.
From these components and their interrelations, intelligence emerges—not as a single thread, but as a weave: recursive, plural, and at times, fictional.
This part of the essay unfolds in three movements:
Composition: How core faculties combine to produce reasoning, language, and creativity—not through accumulation, but through tension, feedback, and reprogramming.
Divergence: Why there is no single blueprint for intelligence. We examine human cognitive diversity to understand the space of architectural variation.
Fiction: How intelligent systems—especially human ones—construct internal narratives to manage complexity, maintain coherence, and navigate meaning.
This is not a final theory. It is a trace—a computational lens on intelligence as it curves inward, reshapes itself, and constructs meaning under pressure. For those exploring AI not as an isolated artifact, but as part of a broader landscape of intelligence, this lens may offer new ways to rethink design and augmentation.
And like a forest, this inquiry offers no fixed path—only branching terrain shaped by tension, memory, and choice.
How Intelligence Arises from Nature, One Layer at a Time
The Trouble with Definitions
“The Tao that can be named is not the eternal Tao.” — Lao Tzu
As a mathematician, I’ve long sought clean definitions. Much of my work involves building precise frameworks — starting by defining key concepts, isolating the core of a problem, formalizing it, and tracing its implications to their logical end.
Yet over time, I’ve come to see not just the limits of definitions, but their quiet distortions—the way they can flatten nuance in the name of clarity. The richness of a living idea gets traded for the sterile comfort of formal neatness. Sometimes, defining isn’t just clarifying — it’s an act of power: shaping perception, and granting authority to the one who defines.
Few ideas reveal this tension more vividly than intelligence. We talk about it as if we know what it is — a score, a skill, a spark. But what is it, really? And can something so dynamic ever be pinned down?
I think of intelligence not as a fixed trait, but as an experience — not unlike beauty — arising in context, felt through interaction.
So while we try to define intelligence — because we must — to witness it, to live with it, or to build systems that move with it, we need something else: humility. An attention to context. A willingness to recognize that intelligence, like beauty, is often messy, partial, plural, heuristic, and still astonishingly effective.
But even our capacity to see intelligence is shaped by history. In The Myth of Superintelligence, I argued that our attempts to define intelligence are never neutral. They reflect what we choose to measure, optimize, and reward. This essay is not a repetition of that critique. It is a step back. A shift in lens. It asks not what intelligence is, but when and how it arises—not as a trait, but as something unfolding across time, scale, and structure.
Because the power to define has always been the power to exclude. Colonial systems didn’t just extract labor and land—they imposed ways of seeing. In doing so, they dismissed the intelligence embedded in other ways of knowing, reframing rich knowledge traditions as myth or superstition. These distortions still echo in how we define and measure intelligence today. African polyrhythms were labeled primitive. Classical Indian music was exoticized or ignored. Indigenous knowledge systems—deeply attuned to land, season, and cycle—were reduced to folklore. Intelligence was there. But the lens refused to see it.
This is why any inquiry into intelligence must also be an inquiry into perspective. Definitions don’t just clarify. They constrain. They shape not only what we see, but what we believe intelligence can be.
This series is an attempt to widen the lens—to trace intelligence not as a fixed trait, but as a dynamic unfolding across layers of complexity. We begin with the silent elegance of physical systems, where matter flows under law, solving problems through coherence and constraint. From there, we enter the domain of evolution, where life adapts through variation and feedback, accumulating structure over time. We then move to the responsive intelligence of behavior—organisms without minds that nonetheless solve, coordinate, and learn through interaction.
But these are just the foundations. In the second half, we abstract upward: tracing how intelligence evolves the ability to frame problems, to reflect on and revise its own rules, and finally, to orient itself—to choose what matters. This is where intelligence becomes recursive, contextual, and ultimately, meaningful. Not just a solver of problems, but a seeker of value.
Why AI Won’t Transcend Us—But the Race to Superintelligence Might Redefine Us
At the dawn of the nuclear age, a handful of scientists raced to split the atom. Behind closed doors, they unlocked forces of unimaginable power—capable of reshaping geopolitics, ending wars, or ending the world. The stakes were enormous. The oversight was minimal.
As the mushroom cloud rose over the New Mexico desert, Oppenheimer recalled the Bhagavad Gita:
“Now I am become Death, the destroyer of worlds.”
It was not just a scientific breakthrough—it was a civilizational rupture, and a moment of spiritual reckoning.
Today, we stand at a similar threshold—but this time, the weapon isn’t atomic, it’s epistemic: the power to define, displace, and dictate what counts as intelligence.
A handful of billionaires now race to transcend the very concept of mind.
This is the race to superintelligence—not just a technological contest, but a geopolitical gamble disguised as an AI boom. It unfolds in boardrooms and GPU clusters, driven by speculation, ambition, and fear.
The headlines scream the urgency: Meta reportedly offered $32 billion for Safe Superintelligence, a small startup co-founded by Ilya Sutskever. Sam Altman claimed rivals are dangling $100 million signing bonuses to lure away OpenAI talent working on superintelligence. And Elon Musk, for instance, has predicted that superintelligence will arrive within six months.
This isn’t science fiction. It’s a live experiment on humanity, with no brakes or off switch.
And these aren’t novelists. They’re the very people shaping global AI policy, capital flows, and public belief. Their words fuel markets, realign talent, and reframe speculation as inevitability.
The story being told is simple: AI will soon surpass us—reason better, learn faster, and predict more precisely. It will understand us, outgrow us, perhaps even save us.
And to be fair, the AI race has already delivered extraordinary breakthroughs. We now have AI systems that can predict protein structures, accelerate vaccine development, improve weather forecasting, and translate languages in real time. They are expanding access to healthcare diagnostics, supporting education in underserved regions, and helping marginalized communities organize and advocate. In the right hands, it’s not just advancing knowledge—it’s redistributing it.
But what if the real story is something stranger? What if these machines aren’t transcending us—but are reflecting our biases, and in doing so, trapping us within a narrative that is narrow, selective, even grotesque?
Just this week, headlines claimed AI is close to solving the Navier–Stokes problem—one of mathematics’ greatest challenges. In truth, it was mathematicians guiding DeepMind—not AI solving math, but humans exploring with new tools. Still, the myth headlines: “AI Solves”.
This is the pattern. AI can accelerate exploration—but it does not choose the problem, define what counts as a solution, or frame the space in which solutions are sought. Those decisions—what matters, what’s possible, what’s meaningful—still come from human minds.
Yet the headlines collapse that distinction. They turn collaborative amplification into autonomous achievement. And in doing so, they reinforce the myth.
The myth of superintelligence—the belief that machines will soon outthink us across all domains—has become the defining narrative of the AI era. It drives billion-dollar valuations, existential headlines, and a mood that swings between prophecy and panic.
At its core is a single premise: that intelligence is measurable, stackable, and conquerable. That with enough data and compute, it will emerge—bigger, faster, better.
But intelligence cannot be reduced to a number. It is not prediction, speed, or performance. Real intelligence—whether in a brain, a slime mold, a flock of starlings, or a cello note—does not arise from accumulation alone. It comes from attunement: the capacity to notice, to reframe, to care.
This series traces the roots of the superintelligence myth—what it is, where it came from, what it obscures, and what its pursuit may cost us. It does not ask whether AI will become superintelligent, but what that belief reveals: a confusion about the nature of intelligence, and a recurring urge to centralize, rank, and control it.
This first essay unpacks the myth itself—its origins, its logic, and its consequences. The next installment begins the recovery: What is intelligence—beyond metrics, benchmarks, and brainpower? What distinguishes it from mere intellect? And why does that distinction matter now more than ever?
नेति नेति Not this, not this. — Bṛhadāraṇyaka Upaniṣad
The Frame Before the Frame: A Prehistory of Discovery
Long before there were “scientists,” there was science. Across every continent, humans developed knowledge systems grounded in experience, abstraction, and prediction—driven not merely by curiosity, but by a desire to transform patterns into principles, and observation into discovery. Farmers tracked solstices, sailors read stars, artisans perfected metallurgy, and physicians documented plant remedies. They built calendars, mapped cycles, and tested interventions—turning empirical insight into reliable knowledge.
From the oral sciences of Africa, which encoded botanical, medical, and ecological knowledge across generations, to the astronomical observatories of Mesoamerica, where priests tracked solstices, eclipses, and planetary motion with remarkable accuracy, early human civilizations sought more than survival. In Babylon, scribes logged celestial movements and built predictive models; in India, the architects of Vedic altars designed ritual structures whose proportions mirrored cosmic rhythms, embedding arithmetic and geometry into sacred form. Across these diverse cultures, discovery was not a separate enterprise—it was entwined with ritual, survival, and meaning. Yet the tools were recognizably scientific: systematic observation, abstraction, and the search for hidden order.
This was science before the name. And it reminds us that discovery has never belonged to any one civilization or era. Discovery is not intelligence itself, but one of its sharpest expressions—an act that turns perception into principle through a conceptual leap. While intelligence is broader and encompasses adaptation, inference, and learning in various forms (biological, cultural, and even mechanical), discovery marks those moments when something new is framed, not just found. [A future essay will take up this broader view of intelligence—and how discovery both draws from it and transcends it.]
Life forms learn, adapt, and even innovate. But it is humans who turned observation into explanation, explanation into abstraction, and abstraction into method. The rise of formal science brought mathematical structure and experiment, but it did not invent the impulse to understand—it gave it form, language, and reach.
And today, we stand at the edge of something unfamiliar: the possibility of lifeless discoveries. Artificial Intelligence machines, built without awareness or curiosity, are beginning to surface patterns and propose explanations, sometimes without our full understanding. If science has long been a dialogue between the world and living minds, we are now entering a strange new phase: abstraction without awareness, discovery without a discoverer.
AI systems now assist in everything from understanding black holes to predicting protein folds and even symbolic equation discovery. They parse vast datasets, detect regularities, and generate increasingly sophisticated outputs. Some claim they’re not just accelerating research, but beginning to reshape science itself—perhaps even to discover.
But what truly counts as a scientificdiscovery?
This essay examines that question. Building on my earlier essay, Can AI Know Infinity?, I argue that today’s AI excels at recognizing structure, but not at reframing it. It doesn’t invent abstractions, ask better questions, or propose new ways of seeing. And that distinction—between fitting the world and reimagining it—is what separates tools of discovery from discovery itself.
Read the full essay—along with others on AI and the future of knowledge and institutions—by subscribing toThe Intelligence Loop. It’s a newsletter exploring how algorithms are reshaping judgment, reasoning, and discovery itself. All posts are free. No paywall. No ads. Just ideas.
What mathematics reveals about the limits of today’s AI — and why AGI remains distant
“An equation for me has no meaning unless it expresses a thought of God.” — Srinivasa Ramanujan
When I was in high school, my mother gifted me a copy of The Man Who Knew Infinity. It told the story of Srinivasa Ramanujan—a self-taught genius from India whose mathematical intuition ran so deep he often described it as divine. At the time, I was just beginning to wrestle with the idea of infinity in math: infinitesimals in calculus, unbounded limits and asymptotes, the endless decimals of irrational numbers, the infinity of natural numbers, and the even stranger idea that some infinities are larger than others.
The book arrived at exactly the right moment. I was learning to manipulate infinity, but Ramanujan’s story hinted at something more: that infinity in mathematics wasn’t just a number to manage, but a realm to enter.
I didn’t just want to understand what he wrote. I wanted to understand what he understood.
That feeling—of awe, of pursuit, of reaching beyond the page—stayed with me. It led me deeper into mathematics. Later, into computer science. Two disciplines that, in many ways, shaped the intellectual landscape we now live in.
One gave us infinity—a way to think beyond the bounds of the finite.
The other gave us AI—a system that now mimics reasoning itself.
And so the question doesn’t feel abstract. It feels urgent. Can AI know infinity?
This isn’t just poetic curiosity. It cuts to the heart of one of the most charged debates of our time:
Are we on the brink of artificial general intelligence?
Some say yes — pointing to models that solve Olympiad problems, generate elegant proofs, or optimize algorithms better than graduate students. Others say not even close — citing the same models’ inability to navigate logic puzzles that a twelve-year-old can solve.
Nowhere is this tension more visible than in mathematics.
Math is the litmus test. It doesn’t bend to rhetoric. There are no blurry edges. Either a solution holds, or it breaks. Either an insight generalizes, or it falls apart.
If AI can “do math,” that’s a serious step toward general intelligence.[1] But what kind of math? And what does “doing” mean?
Because in math, the real challenge isn’t just solving the problem. It’s knowing what problem is worth solving. That’s what Ramanujan saw.
The Fundamental Bug: No Inner Judge
AI today can write code, solve equations, and mimic mathematical language with fluency. But beneath the surface, there’s a structural flaw that keeps it from doing real mathematics:
It doesn’t know when it’s right.
Ask an LLM to solve a logic puzzle, and it might respond confidently with a wrong answer. Or halt prematurely. Or keep going in a direction that makes no sense.
That’s not a user interface issue. That’s the core architecture.
AI is trained to continue. Not to know.
Human mathematicians are different. We often get things wrong—but we know we’re wrong, and we keep going anyway. We chase ideas we can’t yet prove because they feel right. Ramanujan wrote down hundreds of unproven formulas; many were correct, some were not, but all reflected a deeper intuition. Even his errors were structurally suggestive.
What he had wasn’t certainty. It was a sense of what needed to exist.
AI doesn’t have that. It doesn’t feel surprise. It doesn’t experience doubt. It doesn’t recognize a dead end—or a promising mistake.
And even if we one day succeed in bolting on verifiers or theorem checkers, that won’t solve the real problem. Because we still won’t know how to program what feels right.
What makes a question beautiful? What makes a proof surprising? What gives rise to the conviction that a path—though unproven—might be worth walking?
That sense of direction is not verification. It’s vision.
What AI Can and Cannot Do
Before we can ask whether AI can know mathematics, we need to ask: what can it do?
Modern AI systems, especially large language models, have become surprisingly proficient at solving a wide range of mathematical problems — from calculus exercises to Olympiad-level inequalities, especially when scaffolded with hints or structure. But when pushed beyond the known — into abstraction, creativity, or self-directed reasoning — they falter. Spectacularly.
This isn’t just a difference in difficulty. It’s a difference in kind.
Below is a typology of mathematical reasoning — where today’s AI models succeed, and where they fundamentally break.
What AI Can Do Reliably
Symbolic Procedures Example: Differentiate f(x)=x³sin x Why it works: Follows fixed rules; no conceptual insight needed.
Textbook Proof Replication Example: Prove that the sum of two even numbers is even Why it works: Matches learned patterns; templates are widely available in training data.
Scaffolded Problem Solving Example: Solve an IMO inequality when prompted step-by-step Why it works: Excels when nudged in the right direction; mimics known strategies.
Where AI Fails Fundamentally
These examples highlight current limits in mathematical reasoning, especially in creativity, abstraction, and structural understanding. Note: “Why it fails” isn’t a hard boundary. It sketches a limitation, not a permanent impossibility. These assessments reflect extended hands-on work with today’s best models, and despite rapid progress, the chasms remain.
Constructing Counterexamples Example: Give a non-abelian group of order 6 Why it fails: Doesn’t reason structurally; often guesses without understanding.
Reasoning About Parameter Families Example: Analyze the qualitative behavior of the cubic family x³ – ax + b
Why it fails: Can’t generalize across parameters or detect qualitative shifts, like when a cubic suddenly gains extra real roots.
Inventing Abstractions Example: Propose a unifying definition that generalizes pointwise and uniform convergence Why it fails: Cannot generate new conceptual frameworks; stuck within known vocabulary.
Meta-Reasoning and Self-Verification Example: Prove that the halting problem is undecidable, or verify whether a proof by induction includes a valid base case Why it fails: Lacks an internal model of correctness. Confuses structural necessity with surface analogies and cannot distinguish between truth and plausibility—often producing flawed or unchecked reasoning.
Forming Conjectures Example: Suggest a new connection between continued fractions and Pell’s equation Why it fails: Cannot speculate meaningfully; no sense of what’s plausible but unknown.
Evaluating Elegance or Surprise Example: Choose between two equivalent proofs and explain which is more elegant Why it fails: No aesthetic filter; no intuition for simplicity, beauty, or surprise.
AlphaEvolve: Promise Without Perspective
Google’s AlphaEvolve offers a glimpse of what deeper mathematical search might look like. It generates variations of algorithms, evaluates them on benchmarks, and selects improvements that outperform prior versions.
This is more than pattern-matching. It involves testing, comparison, and iteration. In a sense, it’s a primitive form of self-improvement—a system that learns to evolve better solutions.
The upside is clear: AlphaEvolve integrates generation with verification. It doesn’t just guess — it checks. And that feedback loop is a necessary step toward deeper AI reasoning.
But even here, the process is externally grounded. AlphaEvolve doesn’t reason about why an improvement works. It doesn’t reflect on whether the solution generalizes to broader settings or whether it hints at a deeper principle.
It’s evolution without insight.
Even the selection criterion—“performs better on X”—is defined externally. The system doesn’t ask whether the improved algorithm is more elegant, or foundational, or points to a new abstraction. It lacks intentionality. It doesn’t explore the space of ideas, only the space of outcomes.
AlphaEvolve marks progress. But it hasn’t crossed the threshold. It plays with variation. But it doesn’t originate purpose.
To be fair, some approaches — like neural-symbolic reasoning or theorem-proving via language model outputs — aim to address these limitations. Yet even these remain bounded by external verification, not internal judgment.
The Apple Paper: When the Illusion Breaks
Apple’s recent research paper, The Illusion of Thinking, tested modern LLMs on classic logic puzzles—Tower of Hanoi, river-crossing tasks, and other structured reasoning challenges.
The results? Performance dropped to zero.
Children can solve these puzzles. AI could not.
Even when given the correct algorithm, models failed to apply it. They produced shorter, shallower responses as difficulty rose.
This isn’t a bug. It’s the architecture. The models do not “think.”
They simulate thinking. But they don’t test their reasoning. They don’t correct themselves. They don’t know they’re wrong.
The Apple paper exposes the limits of surface reasoning. These systems generate fluent approximations of thinking, but collapse when they need to sustain internal logic over multiple steps.
This is where the illusion breaks. Language isn’t thought. And coherence isn’t correctness.
Even defenders of AI capability concede the point: models behave like they’re “thinking” until the scaffolding is removed. The moment ambiguity, recursion, or inference is required, they flail.
It’s not a problem of scale. It’s a problem of architecture. What’s missing is persistence of thought—the ability to hold a structure in mind, test it, and revise it.
Until models can navigate ambiguity, apply rules with purpose, and revise their logic, they won’t reason. They’ll only perform the appearance of reasoning.
This failure to sustain internal logic reveals the true dividing line in mathematical thought.
Judgment Before Execution
This essay began with infinity, not because it’s hard to compute, but because it marks a deeper shift in how we think. It’s where procedure ends and judgment begins.
That’s the real fork in mathematical reasoning. Not between what’s easy or hard, but between what is given and what is constructed.
Much of mathematics isn’t about solving a problem handed to you. It’s about deciding what the problem should be. What structure is worth studying? What pattern is trying to be seen? What generalization brings clarity?
This is not a mechanical step. It’s not about plugging into a formula or applying a learned move. It’s about seeing a direction where none yet exists.
In chess, the board is fixed. In protein folding, the energy landscape is physical. But in mathematics, the terrain itself is invented. Definitions, objects, analogies — these aren’t constraints. They’re choices. The path isn’t mapped. The map is the path.
This is where AI still falters. It excels at execution — following rules, generating formal steps, even solving highly structured problems. But it lacks judgment. It does not ask whether a problem is worth solving. It does not reframe the question. It does not choose the terrain.
In my earlier essay, The Anatomy of AI Work, I described this distinction in another context. AI is becoming increasingly capable at action-level tasks — optimizing, executing, refining. But decision-level tasks — like framing, interpreting, and generalizing — remain elusive.
Mathematics makes this distinction stark. You can automate the proof of a lemma. But identifying the right lemma? That’s something else.
The most profound steps in mathematics are rarely just answers. They’re questions that reorganize understanding.
That’s why judgment is not a luxury. It’s the heart of discovery. And until AI learns not just how to walk the path, but how to invent the trail, it will remain on the outside — imitating thought, but not engaging in it.
This isn’t just a limitation of AI. It’s a risk for us. In a recent essay, AI and the Erosion of Knowing, I explored how over-reliance on tools that execute well can slowly atrophy our ability to ask, judge, and imagine. If we outsource not just the steps, but the framing of the question, we may lose the very skill that makes mathematics — and knowledge itself — creative.
What’s at stake isn’t just what AI can do. It’s what we stop doing, once it can.
Why Gödel Still Matters
Gödel’s incompleteness theorems didn’t just shatter the dream of a complete formal system. They redefined what it means to know.
His proof used only finite tools—arithmetic, syntax, and symbolic encoding. Yet what it uncovered was infinite: in any sufficiently rich formal system, there are true statements that can never be proven within it.
The problem isn’t the tools. It’s the boundaries.
Some truths don’t resist proof because they’re too complex. They resist because the system can’t even see the need to ask.
And that insight cuts to the heart of the AI debate.
Large models manipulate symbols, match patterns, optimize objectives—but they don’t ask: Is this system enough?
Mathematicians do. We construct new frameworks, challenge assumptions, and pose questions that stretch the boundaries of what we thought was expressible.
But—and this is crucial—even mathematicians can’t answer every question. That’s what Gödel showed. There are truths we feel are true but cannot yet prove. Questions that point to real structure but remain unresolved.
The difference is: we know we don’t know.
We navigate with intuition, taste, and judgment. We recognize when something matters—even if it escapes our grasp. That awareness, that sense of what’s missing, is not mechanical. It’s a distinctly human kind of seeing.
AI doesn’t have that. Not yet.
To step outside a system isn’t just to outgrow it. It’s to realize there is always something beyond.
That’s the leap Gödel made. That’s the mystery Ramanujan lived with. And that’s the frontier AI has not crossed.
From Real Numbers to New Realities
If AI someday learns to reason, it won’t be because it solves a hard problem. It’ll be because it learns to see differently — to reframe what counts as a problem in the first place.
That’s what human mathematicians have done for centuries. The greatest shifts in mathematical thought didn’t come from executing harder calculations. They came from changing the lens entirely:
From Real to Complex Numbers: For centuries, √−1 was considered meaningless. But extending the number line into the complex plane didn’t just “solve equations” — it revealed symmetries, connected algebra and geometry, and gave rise to modern physics.
From Euclidean to Riemannian Geometry: Euclid’s fifth postulate governed geometry for two millennia — until mathematicians asked, What if it doesn’t hold? Riemann’s answer didn’t just alter geometry. It laid the foundation for Einstein’s general relativity.
Cantor and the Infinities: It wasn’t obvious that some infinities are “larger” than others. Cantor’s diagonalization shattered the intuition that infinity was monolithic. In doing so, he built the modern theory of sets — and faced deep resistance from his peers, who found the idea too radical.
Turing and the Machine: Alan Turing didn’t prove a theorem. He invented a new kind of question: What does it mean for something to be computable? The Turing machine wasn’t just a model of calculation — it was a model of limits.
Grothendieck’s Revolution: In the 20th century, Alexandre Grothendieck reimagined algebraic geometry by inventing new abstract structures — sheaves, schemes, and topoi. These weren’t tools to solve old problems faster. They reformulated what the problems were.
These shifts weren’t about finding answers. They were about redefining the space of what could be asked.
That’s not execution. That’s invention.
And until AI can make those kinds of moves — not just follow rules, but reinvent the terrain — it won’t do mathematics the way humans do.
It may answer questions. But it won’t ask the ones that matter.
Mathematical mastery of infinity often means knowing when to tame it — not to reach endlessly, but to choose where to stop.
Taming Infinity: A Human Art
Of course, mathematicians don’t just dream. They also tame.
Infinity isn’t just a symbol of the unknowable. It’s a landscape we’ve learned to navigate — not by eliminating it, but by shaping its contours.
Modern mathematics is filled with tools that bring the infinite within reach. Here are just a few:
Compactness Theorem: Shows that infinite consistency can be captured by finite fragments — a cornerstone of model theory.
Cantor’s Diagonal Argument: Reveals the uncountable through a simple, finite maneuver — a blueprint for thinking beyond the enumerable, and the conceptual seed behind Gödel’s incompleteness and Turing’s halting problem.
Noetherian Induction: Tames infinite descent in algebra and geometry — a finiteness principle that undergirds modern theories like Grothendieck’s schemes.
Ramsey Theory: Shows that in any sufficiently large structure, pattern is not optional but inevitable — a principle that echoes through logic, combinatorics, and theoretical computer science.
Fermat’s Last Theorem (Wiles): A question about integers — deceptively simple — was resolved only by building a vast web of modern number theory: elliptic curves, modular forms, and infinite Galois representations. The infinite wasn’t sidestepped. It was harnessed.
Poincaré Conjecture (Perelman): A century-old question about the shape of three-dimensional space was settled through Ricci flow — a geometric evolution equation that smooths out infinite curvature and complexity over time. Perelman’s proof traversed the infinite, but delivered a finite, complete insight.
These aren’t just mathematical tricks. They are acts of vision — showing how infinite complexity can be transformed into finite understanding.
The point isn’t that infinity disappears. It’s that we’ve learned to fold it, reshape it, and hold it — without letting it slip through our hands.
The Real Bottleneck
The challenge isn’t just scale. Or compute. Or optimization.
It’s the ability to ask: What needs to be discovered here?
The deepest insights in mathematics rarely come from wandering further into infinity. They come from knowing when — and how — to stop. To name a structure. To frame a question. To trace a pattern through chaos and say: Here. This matters.
That move isn’t algorithmic. It isn’t forced. It’s not a brute-force search. It’s a leap.
A leap Ramanujan made again and again — not because he had proof, but because he saw something worth proving.
Hardy once said that Ramanujan’s theorems “must be true, because if they were not true, no one would have had the imagination to invent them.” Their collaboration was full of tension — Hardy insisted on proof; Ramanujan followed intuition. But over time, even Hardy admitted: “I had to trust his insight.”
Another time, when Ramanujan was ill in bed, Hardy visited and mentioned that his taxi’s number — 1729 — seemed dull. Ramanujan immediately replied: “No, it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.”
That’s not computation. That’s not search. That’s intuition on fire.[2]
AI can derive formulas. It can test variations. But it doesn’t see the spark. It doesn’t get excited. It doesn’t get suspicious. It doesn’t chase an idea that isn’t yet real.
Until machines can do that, they’ll compute. But they won’t wonder.[3]
That’s the real bottleneck.
The ability to reach toward the infinite — and know when to stop.
Notes
1. LLMs have made notable advances in solving formal math problems, including competition-level questions. The gap I refer to here is not in computation, but in the capacity to reframe problems or sense structural shifts without explicit guidance.
2. Ramanujan often described his insights in mystical terms, but the point is not to assert supernatural explanation, rather to emphasize how far intuition can leap ahead of proof.
3. It’s possible that AI may eventually emulate or approximate intuition-like leaps through architectures we don’t yet fully grasp. But as of now, such shifts seem to arise from mechanisms outside the current paradigm.
“The obstacle was always the path.” Adapted from a Zen proverb
In the Renaissance, apprentices learned to paint by grinding pigments, mixing oils, stretching canvas, and copying masterworks line by line. In architecture, students spent years sketching by hand before they touched a computer. In math, the best way to understand a theorem is to try to prove it yourself—and fail.
Today, that slow accumulation of competence is being replaced by a faster rhythm.
Ask an AI to write a proof, generate a building, produce an image, or answer a math question—and it will. But something essential gets skipped. And what gets skipped doesn’t just disappear. It quietly erodes.
AI gives us output without process. The result: polished answers with no foundation.
The Eroding Scaffold
Learning isn’t just about information. It’s about structure and the path you took to get there. You don’t truly understand a concept until you’ve built the scaffold it rests on—step by step, skill by skill.
Mira’s Derivative
Take Mira, a student learning calculus. She’s supposed to learn how to compute derivatives. The teacher explains the chain rule: when one function is nested inside another, you must differentiate both, in the right order. It’s abstract, so Mira does what students do—she turns to an AI tutor.
She types:
“Find the derivative of sin(x² + 3x).”
The AI answers instantly:
cos(x² + 3x) · (2x + 3)
It even offers a short explanation. Mira copies it down and moves on.
What she didn’t do:
– Simplify the inner expression.
– Apply the rule mechanically.
– Make a mistake and figure it out.
– Internalize the rhythm of composition and change.
Now fast-forward two months. Mira sees a problem she’s never encountered:
“Differentiate xᵡ.”
She freezes. No familiar template. No AI available. No internal scaffold to fall back on.
She reached the destination — but never built the path.
The skipped struggle was what encoded the concept.
Leo’s Essay
Now consider Leo, a college student writing an essay on political philosophy. He’s supposed to take a position on Hobbes’s Leviathan and argue whether absolute sovereignty is justified today.
Traditionally, Leo would:
– Clarify Hobbes’s argument,
– Develop a counterposition,
– Find textual evidence,
– Draft and revise a logical structure.
Instead, he types:
“Write a 5-paragraph essay arguing against Hobbes’ justification of sovereign power.”
The AI delivers—fluent, plausible, even citing sources. Leo pastes it in, tweaks a few lines, and submits.
A week later, he’s asked:
“How would Hobbes respond to modern surveillance capitalism?”
He flounders. The structure was never his. The reasoning was never practiced. The scaffolding was never built.
He didn’t outsource writing. He outsourced thinking.
Teachers aren’t grading the prose. They’re grading the reasoning it reveals.
Art Without Sketching, Architecture Without Lines
In design and architecture, we’re seeing the same thing. AI can generate facades, floor plans, and renders in minutes. But without grounding in scale, structure, or constraint, designs become fragile—beautiful but unbuildable. The result is a facade that ignores sun direction, a floor plan that fails fire code, and a portrait with six fingers.
In art, tools like Midjourney let users create stunning illustrations from a few words. But if you can’t draw, can you see? Can you revise? Can you critique? Can you tell what’s off?
Drawing is not just a means of production—it’s a way of learning to notice. Line by line, it teaches scale, proportion, balance, rhythm. Without that training, feedback becomes guesswork. Revision becomes roulette.
When the tool does the shaping, the human stops developing the eye.
And when the AI makes a mistake—one that’s subtle, structural, or compositional—there’s no foundation to catch it. You don’t just lose the sketch. You lose the ability to tell when something is wrong.
Oversight Collapse
AI doesn’t reason — it samples.
When a model like GPT writes a paragraph or answers a question, it isn’t deriving a conclusion from first principles. It’s drawing from a probability distribution — choosing what sounds most plausible based on past data.
The result? Output that feels fluent — but isn’t guaranteed to be correct.
That’s what makes AI mistakes dangerous. They’re not just wrong. They’re plausibly wrong — errors with the gloss of insight. And if we’ve skipped the scaffolding, we can’t tell the difference between coherence and truth.
This is the tipping point: when we’re still “in the loop,” but can no longer verify what we see. We’ve become fluent — but not competent. You look like you’re in control. But you’re just along for the ride.
And the risk isn’t limited to math or logic. In “wicked” domains — ethics, design, law, writing — there may be no single right answer. What matters is the ability to justify, adapt, revise, and notice what doesn’t quite fit.
That capacity comes from friction — from having built the internal scaffold of reasoning.
AI gives us output. But it skips the reasoning. It removes the friction — and with it, the growth.
From Work to Erosion
In a recent post, I argued that AI is reshaping work not by replacing entire jobs, but by separating judgment/decision from execution/action. Tools like GPT, Copilot, and dashboards take over action-level tasks. But what remains human is the ability to frame problems, make judgments, and verify outcomes.
In that example, Ada, a software engineer, wasn’t made obsolete. Her job changed. Execution was automated. Judgment was not.
But here’s the deeper risk: what if using AI to execute also erodes our ability to decide?
That’s the dynamic explored here. When AI lets us skip foundational steps—whether in calculus, writing, or design—it removes the very scaffolding that enables judgment to form.
At first, it feels like acceleration. But over time, it becomes erosion.
The erosion of action-level skill becomes the erosion of decision-level agency.
What Can Be Done
The goal isn’t to abandon AI. It’s to use it without losing ourselves.
That means:
– Choosing tools that show their work, not just their output.
– Practicing skills we no longer “need,” because they still underpin everything else.
– Teaching not just what to do, but how to decide, how to verify, and how to notice what’s missing.
What’s at stake isn’t just productivity. It’s agency.
Final Thought: The Skills We Skip Are Still Ours to Build
When we let AI do the scaffolding for us, we don’t just skip steps—we weaken the very structures that make thinking, reasoning, and creating possible.
The skills we skip don’t vanish. They decay. Quietly. Until we need them—and find we’ve forgotten how they worked.
So yes, use AI. But build the scaffold anyway. Because the point isn’t just getting it right — it’s still knowing what right looks like.