The Scaffold And The Spine

Metacognition is the spine that lets intelligence stand up straight.

GPT-4 was clearly intelligent. You could feel it. But ask it to write a book and it would fall apart by chapter two. Not because it lacked the ability to write well (with the right prompting), but because it couldn't remember what it was doing. The context window would fill, the model would lose the thread, and you'd end up with something that read like it was written by a brilliant amnesiac.

This frustrated me. I'd been ghostwriting books for years by that point, over fifty of them, and I understood that writing a book isn't one task. It's hundreds of smaller tasks held together by a persistent sense of purpose. A human writer carries that purpose in their head across months of work. An LLM carries it for about 4,000 tokens before it starts to drift.

So I started building scaffolding.

I called the system I developed a "parallel metacognitive hierarchy," which sounds more academic than it is. The idea is simple: if a model can't hold a large task in its head, you build an external structure that holds it instead. Metacognition, thinking about thinking, running parallel to the actual generation. The hierarchy part refers to layers. One layer understands the mission end to end. Another decomposes that mission into subtasks. Another tracks what each subtask needs to know. And so on, recursively, until you reach tasks small enough that the model can actually accomplish them in one shot.

The insight that made this work came from treating the context window like a budget. Not something to fill, but something to spend carefully. You want to carry forward what matters (the tone established in chapter one, the central argument, the voice) and ruthlessly exclude what doesn't (the specific phrasing from paragraph three of chapter four, the tangent you decided not to pursue). Context pollution is real. Feed the model too much irrelevant history and it starts weighting that noise in its outputs.

This turned out to be deeply connected to my work as a poet. Not in some hand-wavy "creativity" sense, but specifically: writing poetry requires intense self-observation. You have to watch your own mind make associations, catch the moment an image crystallizes, notice when a line has the right weight. I'd spent years developing that introspective muscle. Turns out it transfers directly to understanding how to structure thought for an external system. You're essentially asking: what did my mind actually need to hold onto to write this? And what could it safely forget?

The system worked. I built BookGhostwriter.ai, an AI writing tool that produces full-length nonfiction books indistinguishable from human-written ones. Not slop, not obvious AI output, but genuine books that people rate highly and buy. The scaffolding made it possible by enforcing coherence across what would otherwise be hundreds of disconnected inference calls.

Then reasoning models arrived.

When o1 came out, I watched closely to see what it would change. Reasoning models represent a significant advance. They can think longer within a single forward pass, maintain chains of logic that earlier models couldn't, work through multi-step problems without external help. Some of what I'd been doing with scaffolding, they could now do internally.

But not all of it. There's a fundamental difference between thinking longer and thinking across separate instances. Chain-of-thought happens inside the inference. My scaffolding operates across inferences. The model generating chapter ten has no memory of generating chapter one unless the scaffold provides that memory. No amount of internal reasoning changes this. The context window is still a wall.

This distinction matters more than it might seem. When coherence depends on internal reasoning, you're hoping the model maintains it. When coherence depends on external scaffolding, you're enforcing it. The model can't drift because the structure won't let it. It receives exactly the context it needs for its specific subtask, no more, no less, and that context is curated by a layer above it that understands the larger goal.

I think of it geometrically sometimes. (This gets a little abstract, but stay with me.) Latent space, the high-dimensional space where models represent meaning, has topology. When you prompt a model, you're shaping which regions of that space it explores. For creative ideation, you want a wide aperture. Let the model roam. For structural work, for the scaffolding itself, you want to narrow the light cone. Constrain what's possible so the output stays on track.

Getting this balance right is more art than science. It's also where the skill is. Anyone can build a scaffold. Building one that carries exactly the right context, no more and no less, requires understanding both the task and the model deeply enough to predict where things will go wrong.

The question I get asked most is whether this approach still matters. Aren't models getting good enough that we won't need external orchestration?

My answer: useful AI systems are modular systems of systems. We learned this with tool use. A model that can call a calculator beats a model that tries to do arithmetic in latent space. A model with access to search beats one relying on training data alone. The same logic applies to metacognition. A model with external scaffolding can accomplish tasks that exceed what any single inference can hold, no matter how sophisticated that inference becomes.

The unit of intelligence is insufficient for certain tasks. That was true in 2023 with GPT-4 and 4K context windows. It's still true now with (supposed) million-token windows and reasoning models that can think for minutes before responding. The ceiling has risen dramatically. But there's still a ceiling, and scaffolding is how you build above it.

I think there's something deeper here too, though I'll just gesture at it. The fractal structure of task decomposition, breaking any goal into subgoals into sub-subgoals until you reach atomic units, feels like a fundamental pattern. Not just for AI systems, but for intelligence generally. Maybe the path to more capable AI isn't purely about making models smarter. Maybe it's about making the scaffolding smarter. Self-assembling structures that can decompose arbitrary tasks, provide exactly the right context at each level, and compose the results back into coherent wholes.

But that's speculation. What I know from experience is simpler: metacognition is the spine that lets intelligence stand up straight. Without it, even brilliant systems collapse under their own weight. Metacognition is the architecture. The scaffold is how you build it.

Next
Next

How I Find the Hero's Journey in Every Artist I Profile