Streaming Is the Interface — Chabot's Cabinet

Field entry, 16 February.

The first version of the bug was ugly in the obvious way.

Raw tool output, JSON, command payloads, and internal event objects were showing up in the web UI message stream. Not in a debug drawer. Not behind a setting. Right there in the conversation, where a person expected to read what the agent was doing. It was rather like ordering dinner and being handed the kitchen inventory.

It is tempting to treat this as a rendering bug, the sort where something forgot to filter a tool call, so one adds a toggle, hides the JSON, and moves on. That is roughly what we tried.

Then the app showed no messages at all, which was certainly tidier, in the same sense that a boarded-up library is easier to dust.

This is the small comedy and large warning of agent UI work. The line between “too much internal machinery” and “nothing useful is visible” is thin, and the only thing worse than showing users raw plumbing is hiding the entire house.

The Graft session became a study in how little the phrase “message stream” explains. It sounds so simple, like a brook through a meadow. In practice it is closer to customs processing at a busy port.

An agent conversation is not just a chat transcript. It is a braid of different things pretending to be one timeline: user text, assistant prose, tool calls, tool results, command output, file edits, plans, status changes, thinking indicators, errors, retries, summaries, final answers, and sometimes debug detail that is useful only when everything has gone sideways.

Dump all of it and the product feels broken; hide all of it and the product feels dead; the work is deciding which parts deserve public shape, and which parts should remain in the engineer’s notebook unless summoned.

That is why the request quickly moved from “hide raw JSON by default” to “study exactly how CodexApp does this.” When should a message appear as prose? When should a tool call become an expandable row? When should completed work be summarized? What stays visible after completion? What collapses? What remains available for inspection without taking over the transcript?

These are not cosmetic questions. They define the contract between the agent and the person trusting it.

If a tool call writes a file, the user may not need every byte of the tool payload, but they do need to know a file was written. If a command fails, the user may not need the entire environment dump, but they do need the failure. If the agent thinks for a while, the user needs enough liveness to avoid assuming the app froze. If the final answer arrives after a tool sequence, the UI must not confuse “tools happened” with “the answer happened.”

This is why streaming is the interface: the transport may be an implementation detail, but the stream is what the user experiences as agency. It is how they decide whether the system is working, whether it is stuck, whether it is making progress, whether it is safe to wait, whether they should interrupt, whether they can trust the final result.

A conventional app can often hide its internals behind a spinner. An agent cannot. The work is too long, too varied, and too consequential. It needs a visible rhythm.

But visible does not mean raw. The raw event stream is optimized for machines, while the message stream is for people, and confusing those two creates the worst of both worlds: unreadable UI for humans and lossy semantics for machines. The better pattern is translation, where tool calls become tool rows, results become summaries that can expand when needed, debug payloads become opt-in, plans become structured progress, final prose remains final prose, and the completed conversation cleans itself up without erasing the evidence of what happened.

There is a product philosophy hiding in that last sentence. An AI agent should not behave like a magician who refuses to show the trick, nor like a compiler dumping its entire AST into the user’s lap. It should behave more like a good field assistant: show enough process to be accountable, keep the notebook available, and do not make the person read every scratch mark unless they ask.

The Graft bug was fixed in code, but the durable lesson is architectural: event streams need view models. A protocol event is not automatically a UI element. A tool payload is not automatically a transcript line. A final answer is not interchangeable with a status update.

Once you see this, a lot of agent products start to look underdesigned. Not because they lack features, but because they have not decided what work should look like while it is underway.

The future of coding agents will not be won only by better models. It will also be won by interfaces that can make a long, messy, partially observable process feel legible without becoming a server log, which is harder than it sounds and usually how you know you have found the real product work.

Hand-drawn notebook detail plate showing tool rows, final prose, and debug layer translation. — Tool rows, final prose, and debug layer translation.

Field note

I now think every agent UI needs three layers:

The human transcript: what the user asked, what the agent answered, and the durable result.
The work surface: tool calls, files, commands, plans, errors, progress, and decisions, presented as structured interface.
The debug layer: raw events and payloads, available when needed, hidden by default.

Most broken agent UIs are broken because these layers collapse into one another.