Authentication Is Four Bugs in a Coat

Field entry, 15 January.

Authentication bugs have a talent for presenting themselves as moral failures.

You log in. The browser says yes. The app says yes. Then the app kicks you back to the browser, as if you had done something suspicious in the three seconds since being welcomed inside. It is less software than a very small border crossing with an overactive stamp.

That was the shape of the Maestro CLI bug. Running the CLI against dev opened the auth page. I confirmed the login. The app returned and said it was authenticated. Then it sent me back to auth again, a polite revolving door with delusions of security.

The first suspicion was token storage, because token storage is where authentication systems go to misplace their correspondence. Maestro had multiple environments: dev, staging, and prod. Authentication in one did not work in another. That meant token storage had to be keyed by domain or environment, not treated as one global blob of “user is logged in” optimism, so we checked that and discovered the more annoying version: prod and staging worked, but dev did not, which is how you know the bug has put on a waistcoat and decided to lecture.

The next failures were not philosophical. They were practical and annoying. Multiple browser auth windows opened. That made the debugging loop risky because failed auth loops can become accidental rate-limit generators. A terminal app that repeatedly opens login windows is not merely wrong; it is rude.

At this point the useful move was not another guess. The useful move was diagnostics.

The agent added an errors.txt path under the local app data directory and started logging the things the UI could not explain: socket errors, auth failures, login errors, URL and connection state. That is the point where the work shifted from “maybe tokens are overwriting each other” to “let the app leave a trail we can actually inspect.”

That trail mattered because the bug was not one bug. It crossed environment-specific token storage, token candidates, /project/settings, socket connection behavior, and terminal UX; the client needed to reason about current, access, and ID-style tokens rather than assuming a single blessed string, dev could require user environment values that prod and staging did not, trying to connect after preflight failure only created another auth-looking failure, and a TUI without copyable diagnostics forces the user to become a logging system with eyes.

This is why auth debugging is so often maddening. We call it “authentication” as if it were a small gate at the front of the building. In practice it is a corridor that passes through storage, routing, browser handoff, backend configuration, session transport, environment policy, and error presentation.

Any one of those can fail. Many of them produce the same user experience: please log in again.

The important change was to stop treating the browser loop as the problem. The loop was the symptom. The product needed to stop launching the browser until it understood why the previous attempt had failed. It needed to fetch project settings before trying the socket. It needed to log the exact status/body when that preflight failed. It needed to show missing user environment requirements rather than collapsing everything into “auth failed.” It needed to stop the socket connect when the preflight was already telling us no.

That last bit is easy to miss because retries feel like resilience, but retries without a diagnosis are just a faster way to make the same mistake. In an AI-assisted loop, this becomes more dangerous because the agent is perfectly happy to keep testing, keep opening windows, keep trying variants, keep producing motion. Motion is not progress. Sometimes progress is making the system refuse to continue until the error has a name.

The result was less glamorous than a single elegant fix. It was a diagnostics path. It was environment-aware storage. It was preflight checks. It was less eagerness. It was making the app say, in effect: I am not going to pretend this is a login problem until I have asked the house what sort of guest it expects.

That is a good pattern: auth should fail loudly enough to be fixed, but not so loudly that it opens four browser windows and makes the user worry about rate limits; it should be generous with evidence and conservative with retries; it should distinguish “I do not know who you are” from “I know who you are, but this environment requires something else”; and for terminal tools, it should always leave behind a file you can copy from.

Because when the app is trapped between a browser, a socket, and a remote service, the user’s clipboard may be the most reliable instrument in the room.

Hand-drawn notebook detail plate showing tokens, sockets, project settings, and diagnostic traces. — Tokens, sockets, project settings, and diagnostic traces.

Field note

The more I use agents for debugging, the less I trust single-cause explanations for boundary bugs.

Auth loops are rarely just auth loops. Streaming bugs are rarely just streaming bugs. “It does nothing when I click send” is rarely just a button.

The useful question is not merely “what part is broken?”, but “which boundary is hiding the evidence?”, because in this case the hidden evidence lived between the successful browser login and the failed dev runtime connection. Once the CLI could write down what happened there, the bug remained annoying, but at least it became less spooky.