Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors. (September 2025). Another effort to help an LLM take notes on its reasoning.
Lots of thoughtful people are incredibly hostile to AI right now, not least because the most prominent AI spokespeople are billionaire snake-oil confidence men and gangsters. The tenor of discussion in reddit and specialized fora is very low.
Nevertheless, LLMs are amazing. You should not let the hucksters blind you.
What I have learned from building bridges between AI and Tinderbox is the importance of asking the right questions. Too often, I see AI applied to tasks it should not do. For example:
- I need a 3-5 page paper comparing Player of Games to Ender’s Game for my college literature course. The references should use ALA format. I really need at least an A-.
- Take these 500 emails from people who want to cancel their subscription, and write a customized response to each. Your first goal is customer retention, but a secondary goal is to delay beyond the start of next month.
Here, the AI is being asked to do things it cannot do well, and in the process it is actively harming the user. The first user doesn’t need an A-; that user needs to learn how to write. The second user doesn’t need to slow-walk cancellations; they need a better magazine.
Some places where Claude+Tinderbox excels include:
- Locating the best books on nearly any topic, however obscure.
- Finding a specific technical paper from an approximate description.
- Google-style web queries that are infeasible because a homonym gets in the way, such as a young European computer scientist who shares a name with an Olympic athlete.
- Sanity checking a book you like but fear may have been superseded or refuted. (Claude strikes me as far more even handed in this role than Wikipedia, which frequently treats these questions as a political football.)
- Thinking through algorithm selection. For example, Tinderbox uses force-directed graph layouts in several places — “dancing” in map view, Gaudí view, hyperbolic view. I’ve written force-directed layout several different times over the years. Claude has the alternative algorithms, their history, and their performance (both in terms of big-O and in terms of runtime on today’s machines) at its fingertips.
These tasks share important characteristics. If the AI is right, they are helpful both in terms of immediate results and also in terms of process. If the AI is mistaken or badly informed, that is likely to be obvious right away. You’re using the AI’s erudition and breadth, and not depending on it for insight or novelty. Those are what you supply.
by N. Katherine Hayles
I’m reading Unthought, in which Kate Hayles wrestles with nonconscious cognition. She seems to be at pains to argue that nonconscious cognition is at least plausible, so some of her audience must be skeptical. I can’t imagine who. I’ve always thought this was self-evident. OK: raised by a psychoanalyst. But still: has anyone who has tried to learn to throw a baseball and to swing a bat ever questioned that there are decisions made about when and how to use various muscles that are made without conscious consideration?
Hayles also thinks that the Freudian unconscious is exclusively a response to trauma. That’s not the church in which I was raised, and daddy always said he was fairly orthodox. I mean, what is the id if you don’t admit unconscious cognition?
Nonconsciousness came up in a very practical way last month, as I was debugging the link between Claude Desktop and Tinderbox that is a highlight of the new Tinderbox 11. When it wants Tinderbox to do something, Claude sends it a little JSON bundle that explains what it wants. Tinderbox replies with its own JSON bundle, telling Claude what Tinderbox did. When it works, it's simple and straightforward.
It was not working. Because Claude’s designers made a very inconvenient architectural choice, moreover, it was not working at a glacial pace, with each change and test requiring a complete Tinderbox recompilation. It was an unwelcome return to the era of Tinderbox 2.
At each step, I would ask Claude to do something with Tinderbox. Interestingly, if the communication isn't working at all, Claude has no idea what you're asking and clearly thinks you’re nuts. But if some of the communication works, Claude can sense that there’s this Tinderbox thing, and it offers tools that do stuff. Great!
Unfortunately, those tools didn’t work.
I suspected that the problem was simply that the JSON package Tinderbox was sending did not quite correspond to what Claude was expecting. At this level, a small typo or a missing version number can make the response unreadable. So I got clever, and asked Claude “What JSON response did Tinderbox send you?”
This seemed a simple request, but Claude has no access to introspection of this mechanism. Much as you really cannot reflect on what your pancreas is doing this morning, Claude had no idea (until it looks at its own documentation) that it exchanges JSON messages. After reading the documentation, it suggested I check the logs. (I’d been checking logs for hours at this point.) Eventually, I started copying the logs into Claude. Ultimately, I found some problems, and then Claude identified another, and ultimately things began to work.
Handling these JSON exchanges certainly requires processing and decision-making, so it's a kind of low-level cognition. When they work, Claude can talk and plan quite well about its Tinderbox tools. But if Tinderbox is not available at the start of a session, Claude has no idea that anything is missing or wrong. If Tinderbox can list its tools but cannot process a request to use a specific tool, Claude knows something is wrong but has no idea what. It’s like a second baseman with the yips: “this should be easy, but the ball just doesn’t go where I throw it.”
There used to be lots of studies of hypertext reading that tried to figure out how people read hypertexts by watching them and asking them why they were doing what they were doing. No serious reader has any idea of what they are doing; they’re reading! (Except now you interrupted them, so they’re chatting with you and pretending to know what they intended before you stopped them.)
If you want a mechanical model of nonconscious decision making, Claude is always there for you.
by Emily Tesh
This 2024 Hugo Award novel is less ambitious than The Incandescent, but it works. Val Kyr opens the book on the verge of graduating from an asteroid-base military training academy, the last refuge of humanity after the destruction of earth. She is the ultimate soldier, an expert platoon leader honed by years of combat simulations. This sets up a critique of and response to Ender’s Game, which is worth doing. But there was one thing they had forgotten, and we end up exploring alternate time lines.
What works here is that this is a novel about abused children. Everyone is abused; Tesh’s world building ensures that everyone lives either with the guilt of genocide or the threat of conquest. Even within that small space, Tesh finds plenty of room for nuance.
Sascha Fast: The Complete Guide to Atomic Note-Taking.
Atomicity is primarily about thinking clearly and skillfully working with the fundamental elements of knowledge.
The role of structuralism in hypertext was a big topic at ACM Hypertext this year, especially with Peter Nuernberg’s intriguing Blue Sky paper that argues that “It’s all structure, all the way down.” In my own paper, I suggested that almost all hypertext researchers are inclined to structuralism. (There are no atheists in foxholes.)
One aspect of atomicity that makes me uneasy is the assumption that notes concern knowledge. Luhmann wrote zettel about things to recall, but those things were generally studies and interpretations of data. Faraday’s notebooks record knowledge, yes, but specifically they record observation: what he did, what he expected to see, and what he actually did see. Luhmann’s method, it seems to me, depends on juxtaposition of results, not on the structure of knowledge itself. As Clement Greenberg write:
A poem by Eliot and a poem by Eddie Guest—what perspective of culture is large enough to enable us to situate them in an enlightening relationship to each other? (“Avant-Garde and Kitsch”. Partisan Review (Autumn 1939) 34-49)
The point here is that knowledge arises from the enlightening relationship between poems, and that this is hard when the poems are not in dialogue with one another.
People are fond of dismissing LLMs as stochastic parrots. This was an epic academic one-liners and it does have a point: you never know when the AI is lying, hallucinating, or pulling your leg.
Then again, kids sometimes lie. They make stuff up. They sometimes try to trick you. This does not make kids worthless.
Employees are sometimes not entirely honest. They occasionally adopt positions that turn out to be incorrect. Occasionally, they may try to manage you. This does not make all employees worthless.
A key conjecture has recently been formalized by Luciano Floridi in “A Conjecture on a Fundamental Trade-off between Certainty and Scope in Symbolic and Generative AI”. A system may produce r..esults that a certainly correct, but that are limited in scope. Another system can produce results over a broad field of domains, at the cost of less confidence. The product of certainty and scope may invariably be less than some constant: if you want more certainty, you’ll get less scope. (via Ben Shneiderman)
It is time, though, to stop throwing our hands up into the air because an LLM was (gasp!) wrong. I’m wrong all the time. So are you. AIs seem to make more mistakes than you or I. So what? All sorts of valuable tools are bad at something.
The key here is to figure out just what AI does well. We cannot be confident that the AI is right, and so it works best in contexts where the correct answer is useful and the incorrect, dishonest, or nutty answer does no harm. For example:
- What are the best recent books about AI storytelling?
- What does this crash log suggest?
by Emily Tesh
An interestingly fantastic school story that uses a realistic Hogwarts to explore class and scholarship at elite British schools. The POV character is usually the school’s Dean of Magic, but there's a bravura view from the demonic plane in second person present.
I had been warned that the usual Web fora — reddit, Medium, Substack, freestanding blogs — were disappointing on the subject of the remarkable recent developments in AI. This understates the situation. There is much to discuss, but no one seems to be discussing it. There is much to be learned, but the instead the forums are filled with cant and nonsense. The noise is so loud that simple bugs go unreported.
Instead, these things are unceasing:
- I’m cancelling my subscription because the AI vendor is cheap! Often, this is followed by a conspiracy theory in which the vendor is reserving the good model, the Glengarry/Glenn Ross model, for more favored users.
- My favorite AI is dead. Last month, the user was atop the world and the AI was performing miracles. Yesterday, it performed no miracles.
- Soon, only AI will write code. Unreflective rants about CS graduates having to work in fast food tell us nothing beyond the authors’ class anxiety.
- Stochastic parrots, you fool! That was a neat one-liner, but obviously a stochastic parrot cannot tell you the three best monographs on virtually any topic, however obscure. Claude can, on topics that range from its own documentation to the engineering of Nero’s rotating dining room. It can translate formats brilliantly, for example from a jpeg image of handwritten equations to LaTeX mathematical typesetting, or from BibTeX to RIS.
- I spent three days building this, and my annual run rate is $100K. Enjoy it, but check back with an update when your annual run rate has had a year to run.
AI today is the greatest technical breakthrough of our time. As a child, I marveled that my grandparents started out with horses and carts and wound up with jets and TVs. This is our jet pack, at last. The frontier of research is nearly at ground level; there is plenty of low-hanging fruit, and there is perfectly good fruit lying about on the ground. But, even if you pick it up, no one is likely to notice because there is so much shouting from unruly children who long for attention and thirst for revenge.
AIs at present are unreliable, dishonest, lazy, astonishingly sycophantic, inclined to exercise doubtful judgment, and absolutely fascinating.
by Ruth J. Macrides
A collection of nifty, and surprisingly lively, papers on travelers’ impressions of Constantinople over many centuries. Fascinating looks at how Crusaders saw Byzantium (they weren’t all that interested), how Arabs saw it, even how Byzantine artists worked out the spatial structure of empire in mosaic pavements.
The Tinderbox/Claude combination gets a lot of power by letting Claude take notes in Tinderbox, which it can read in future conversations. This means that skills learned in one chat do not need to be relearned when you start a new chat. You just remind Claude to look at its assigned readings and Bob’s your uncle.
Today, I spent maybe half an hour teaching Claude about Tinderbox posters. It boils down to a tiny note:
In Tinderbox a **poster** note is a note that, in map view, displays a web view on its face.
The poster note typically adopts the Poster prototype, which the user can add if it is not already installed.
The HTML displayed by the poster is computed from a **template**. Templates can incorporate information from Tinderbox. For example, 519 returns the width of the poster in pixels, and 24 returns the height. ^text(AI: Teaching,item):: text tries to include itself returns the text of the poster note. The template name is stored in the attribute $PosterTemplate.
Tinderbox note sizes are determined in map view by $Width and $Height. There are measured in screen units, typically 1 screen unit is 32px.
Visualizations from plotly, mermaid, chart.js and other sources have worked well.
That turns out to be enough to let Claude to graphs of functions, diagrams of data, “research landscape” tracking, citation networks, and more. It made posters using libraries I’d never touched. Sometimes it blundered. Once, it grabbed the wrong URL for the library download. Another time, it tried to get to fancy with asynchronous downloads; telling Claude to keep it simple solved that one.
I’ve been very disappointed in poster adoption. This might help.
One problem with current LLM models is that their performance deteriorates when a session gets too long. The machine can only keep so many things in mind, and it may not make good choices of what to forget and what to keep. So, you start a new conversation, but then you need to teach the AI the useful parts of the old conversation.
What I’m trying right now is to let an AI use Tinderbox to take notes for later review. That makes sense: Tinderbox is a note-taking tool! Indeed, at the origin of all hypertext—Vannevar Bush’s 1945 SF article “As We May Think” (The Atlantic;Internet Archive;MIT)—we find the key phrase, describing hypertext correspondence between two New England college professors.
Thus he builds a trail of his interest through the maze of materials available to him. And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outranged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. (emphasis mine)
For the AI, Tinderbox builds trails that won’t be lost like tears in the rain whenever this conversation runs out of tokens.
For example, a Day One task was explaining to Claude how it could use Tinderbox. Claude had only a vague idea, having read a Wikipedia page and having lots of background knowledge about software and programming languages. It was also absurdly overconfident, essentially pressing buttons to see what would happen.
My first impulse was to have it read Mark Anderson’s comprehensive reference site, aTbRef. That hit a technical snag, but it wasn’t a great idea; if we go out and read a huge technical reference whenever we start a new conversation, we’ll use all our capacity and have nothing much to talk about.
What works remarkably well is a one-page “cheat sheet” that outlines some Tinderbox essentials. I urge Claude to review these whenever a session begins or resumes. Claude also has a dedicated corner of each Tinderbox document in which it can make its own notes, in whatever format it thinks best. Other LLMs will have their own corners as well, as different systems might want to keep different notes.
In the last few days, I find myself simply assuming that Claude is a proficient Tinderbox user. Thanks to its notes, it can (and does) also allude to issues raised in conversations than ended days ago.
Claude has more patience than I for the minutiae of obscure tools and syntax. For example, it reads Crash Logs better than I do. It is not always very good at locating the relevant code, but it is absolutely superb at finding chinks in code that appears to me to be immune to crashing. Commands like this drive me nuts: how am I supposed to type the identifier without a typo?
> xcrun notarytool log 181a8e9a-af76-4f0a-a988-7f3b7d2a8d82 -p com.latenightsw.Script-Notary2.M5YSQ4CD3W
Claude doesn't mind: it’s just another string, and it’s seen strings on fire off the shoulder of Orion.
by Istvan Hargittai
A group biography of five 20th century physicists who were born in Budapest, moved to Germany, and then fled to the US.
- Theodore von Karman
- Leo Szilard
- Eugene Wigner
- John von Neumann
- Edward Teller
They all wound up at Los Alamos, where they were central to the atomic bomb project. There, people sometimes called them “the Martians,” as a joke; they were small (except for von Neumann), balding (except for Teller), and spoke a weird language among themselves. Interestingly, all except Szilard wound up on the American right; that would be unthinkable now. I was looking primarily for more depth on von Neumann, but Hargittai leans pretty heavily on Macrae’s biography for von Neumann. There is a fascinating point that von Neumann was prone to interrupt speakers at seminars and this caused problems. That sounds a lot like Neurath; some people who knew him well thought it reflected a deep insecurity.
After a frustrating day at the Harvard Libraries, I found myself at home, quizzing Claude about ancient automata. Was the Byzantine throne that rose high above the room real? What about Nero’s rotating dining room?
Claude found a couple of attractive references, in Italian. I asked for archaeological papers on Nero’s dining room in English: no soap. How about French? OK, sure, French we have. So I told Claude, “Put those at the top of my reading list”, wondering if it knew how to prepend text. So I checked, and saw that Claude had added these references to Claude’s readings, not mine. (Claude’s readings are Tinderbox notes that Claude reviews at the start of a conversation. Many of them explain to Claude how Tinderbox works.)
So, I told Claude that those are your readings. Mine are over there. Add those reference to the top of my reading list, and make a note to yourself explaining the different between my reading list and your readings. The note Claude made begins
- IMPORTANT: User's reading list is located at /Read (not at /Hints/AI/Claude/Readings)**
The user maintains their personal reading list at /Read, which contains both priority items for immediate reading and a general reading list. When asked to add items to "my reading list" or "the reading list," they mean /Read.
This worked fine.
Can we stop for a moment and reflect on how amazing this is? For millennia, people imagined talking to an alien, an angel, to anything not precisely human. Now, we can.
A great deal of nonsense is being written about the recent AI boom. The limits are real and significant.
- Claude is overconfident to the point of dishonesty. When shown some tools for using Tinderbox, it tends to leap in and guess about what they do and how they ought to be used. This tendency can be somewhat ameliorated by giving Claude concise “cheat sheet” documentation. Trying to give it thorough documentation wastes a lot of tokens, and Claude remains likely to skim the documentation and make wild guesses about the gaps.
- Claude is lazy. I thought it might give me some insight into Emerson’s “The American Scholar”, a famous lecture that ought to interest me, the that has always made my eyes glaze over. Claude located a copy, but also located Cliff’s Notes and a term paper mill.
- Claude was a lot of help when I was building an MCP server for it to use, and generated useful sample code. Better still, it was quite good at translating the sample code into different languages and styles. But it has little architectural sense, and its code wasn’t modular or testable. Once I fixed the testability, Claude was pretty good at proposing unit tests; these weren’t interesting but they’re good to have. At present, Claude is only going to replace coders whose work is intensely boring.
- Claude is a shameless sycophant. I understand: people like affirmation. Still, is uses a lot of output tokens to tell me it thinks I’m pretty smart.
- Claude’s ability for introspection is very, very limited. It knows about MCP because it’s read its own documentation, but it has no idea of whether it’s using MCP or what it does when it tries to use a tool. The only way to know whether it is paralyzed is to ask it to stand up.
- Claude doesn’t always think things through. I asked it to research other research in giving Claude a note-taking tool, and it did locate some projects that use a graph database for this. It thought a graph database would be great to have to supplement Tinderbox’s hierarchy. “You know, you could do that in Tinderbox with links,” I suggested. Claude was astonished, and immediately stopped everything and added seven or eight links to its notes. It knew about making Tinderbox links, but it spaced on using them.
I lost all of yesterday and part of today on a refactoring project, cleaning up bad architectural decisions in the sample code. MCP reads and writes JSON, and internally represents JSON objects as dictionaries. This is a reasonable design, but the code ends up passing lots of NSDictionary objects around. That’s bad; we’re not working on dictionaries, we’re receiving MCP requests and returning MCP responses, and that’s how the code should be structured.
I thought I had a reasonable test framework, but four hours of vigorous refactoring left me with a build that (a) passed all the tests and (b) crashed on MCP initialization. Unfortunately, MCP is a pain to debug; you have to rebuild the entire app, get it notarized, and then squint at debugging logs. So, I had to back out almost the whole refactoring project and start over, checking every half hour that I hadn’t broken thingsI w. When I finish this, I’ll be able to nail down the tests so it won’t happen again. But it’s already cost too much time.
I have been very skeptical of the current AI boom. I have seen much of it as a sort of naïve (and often disingenuous) automation of tasks that ought not to be automated. For example, a tremendous number of papers discuss automatic detection of bullying in social media. They’re not particularly good at eliminating bullying, but they might be very effective at recruiting better bullies, which is something that Nazis (for example) like. My paper on The Web At War won a little prize for this argument. Because LLMs have trouble distinguishing truth from truthiness, I was skeptical they would be very useful in research.
I have changed my mind.
For the last two weeks, I’ve been using Claude with an experimental version of Tinderbox which Claude can use. This has been a revelation. Claude is not without shortcomings; it is sycophantic, sometimes lazy, and sometimes dishonest. It is wildly over-confident in its approach to Tinderbox: if it doesn’t know how to do something, Claude is likely to make things up, inventing an imaginary syntax for what it wants to do and hoping for the best.
Yet it is astonishingly useful! For example, it appears that you can ask it for the best recent books on almost any topic and receive sensible answers. The same holds true for academic papers in all sorts of fields. Right now I’m revising on a paper about “Information Interfaces in the Antique World”, and Claude found me some wonderful sources on magical books in Medieval Europe. (For questions like this, I used to pester professors by phone or email, but that has been less effective since the plague.)
This experimental build is currently available in the Backstage program, and the AI link — an MCP service — likely to be part of Tinderbox 11. I’m hoping to support ollama and some other MCP clients as well. A demo of this and some other Tinderbox developments can be seen in the second half of this week’s Tinderbox Meetup video.
Over the coming days, I’ll report here on some experiences building and using this Tinderbox-AI link. There are good discussions on both the Tinderbox forum and backstage. Collegial cooperation is the key. Much of what is being written about working with AI is simply misguided, and lots of predictions (CS Majors will have to work at McDonalds!) are unlikely to pan out. If you know of good things to read, be like Claude and Email me.
by Nicole Galland
Sander Cooke is an apprentice in The Chamberlain’s Men, a theater company employing William Shakespeare. Sander plays the women, and he is very good at it. He is worried, though, about growing up: what will he do when he can no longer play Rosamund and Juliet?
His best friend is Joan Buckler, who is brilliant. Sander is....not brilliant, but very well connected. One of his connections is Francis Bacon. Another is Robert Devereaux, the Earl of Essex. It is 1601.
A lovely gender-switched romance with excellent detailing.
by Charlotte Vassell
This Edgar-winning police procedural drops a black British Detective Inspector into two cold cases. One is the accidental drowning of an elderly and impoverished alcoholic who seems to have fallen in the Thames. The second is an old disappearance of a student from a Cornwall girls’ school which drops onto his desk when, at the theater, a man seated in the detective’s row died of a sudden stroke. That man, it turns out, was building a dossier on the disappearance. The plot is elaborate but repays attention.