aiagentic-codingengineeringsovereigntydeterminism

AI is a multiplier — the responsibility doesn't transfer

June 2026 · 9 min read

AI changed what one engineer can build, and the change is not incremental — it is a multiplier. A sparring partner that never tires, the field's accumulated knowledge a question away, a draft of almost anything in seconds. I am not here to talk that down; I think it is the largest shift in how software gets made in a generation. But a multiplier has a property that is easy to forget in the excitement: it amplifies whatever you point it at, including bad judgement and worse structure. The speed is real. So is the slop. This is about taking the first without the second — working as an engineer who uses AI, not a vibe-coder who is used by it — and about the one thing the multiplier never takes off your hands.

Fast usually means slop

AI makes building fast. Fast, left to itself, means slop — code that runs, demos cleanly, and quietly rots, because nobody had to understand it in order to produce it. The antidote isn't to go slower. It's to be deliberate about where you point the tool, because the tool will not be deliberate for you.

The trap is subtle, and it's economic. What used to discipline an engineer was effort: a bad idea was expensive to implement, so you thought twice before building it. When building becomes cheap, that brake disappears. Every abstraction is suddenly affordable, so every abstraction gets built. Nothing replaces the missing brake unless you install it on purpose — and most of the failures I've seen with AI are not the model being wrong, they're the brake being gone.

Leverage, not autopilot

It helps to think of AI as leverage rather than autopilot. Leverage multiplies force; it does not choose a direction. Point it at a sound structure and it compounds into real velocity. Point it at a shaky one and it compounds into a mess faster than you can read the diff. The multiplier is genuinely indifferent to which — it will amplify a good decision and a bad one with equal enthusiasm.

That reframes where your attention belongs. The scarce resource is no longer typing — it's judgement: what to build, what to leave out, which structure to commit to before cheap-to-generate code sets like concrete around it. The model supplies output; you supply direction. Confuse the two and you have autopilot, which is just a longer word for hoping.

Leverage multiplies force; it does not choose direction. Point AI at a sound structure and it compounds into velocity; point it at a shaky one and it compounds into a mess faster than you can read it. The multiplier is indifferent to which.

What it looks like when the direction is wrong

I can show you the failure mode, because I built it. Setting out to govern AI-assisted delivery, I started a small tool and let it escalate into a complete system: a deterministic kernel with no model calls, layered architectural contracts, an agent runtime, its own workflow language, even an ontology that mapped my build process onto operating-system concepts. By the time I stopped, it was 479 files and roughly fourteen thousand lines of Python — with twenty thousand lines of documentation on top. More prose explaining the architecture than there was architecture. Twenty-five artifact schemas. Ten planned agent roles; four actually built.

None of it was hard, and that is exactly the point. Every layer was cheap to generate, so every layer got generated. The state machine grew so intricate that its own changelog records cutting states back "to reduce complexity" — a system too complex for itself. There was a whole document answering "isn't this too complicated?" with "no, it is fully explicit" — which, in hindsight, is what over-engineering says while it rationalises. The real tell was my own behaviour: the full workflow cost about twenty minutes per change, so for any real batch of work I quietly bypassed my own framework and committed straight to the main branch. I had built an elaborate machine and then routed around it.

The lesson is not that frameworks are bad. It is that the expensive mistake was never too much understanding — understanding the whole problem is free and worth doing — it was too much building, too early. Committing the full picture to code before you need it is precisely where the multiplier turns on you. The decision the tool could not make for me was the only one that mattered: what not to build.

Keeping control while going fast

What replaced that system is almost embarrassingly small. The same goals — keep inference controllable, keep reasoning honest, keep the work auditable — now live in configuration and a handful of market components instead of a bespoke system. A single local proxy is the one endpoint every tool talks to; switching models is a one-line change; the parts I actually own are a config file and a thirty-eight-line cost guard. "Integrate, don't build" was the whole correction.

A few patterns do the real control work, and they're portable to any setup. The thread through all of them is the same: the most reliable signal is never the model's confidence in itself — it is something outside the model. Structure beats discipline here, because discipline is something you have to remember and structure is something you can't forget.

— Separate reasoning from execution. A thinking tier plans and reviews; a doing tier writes code. They are different calls with different jobs, so neither quietly does the other's work.
— Make the check an adversarial, separate call. Before code is written, a challenger step tries to break the plan — because a model asked to review its own reasoning mostly confirms its own bias, and an independent call doesn't.
— Clamp the invariant once, at the gateway. Cost ceilings and limits live in one place every client passes through, enforced and unit-tested — not re-implemented, and re-forgotten, in each consumer.
— Enforce sovereignty structurally, not by discipline. The configuration simply has no route to a non-approved backend: the wrong path doesn't exist, so it can't be taken by accident.
— Keep memory as a versioned single source of truth, in the repo — so context is reproducible rather than ambient and locked to one tool.

Reasoning and execution are separate calls. An adversarial challenger tries to break the plan before any code is written; a model asked to review its own reasoning mostly confirms its own bias, so the check is an independent call, not a self-review. Post-code review then runs against the diff.

The cost the multiplier hides

Be honest about what the multiplier costs, because it does cost. The sharpest lesson I learned the hard way: green tests are not a verified result. On one stretch I had a couple of dozen commits, a couple of hundred passing tests, and a stack of stories marked done — and the rendered output was visibly broken. Half the charts were never embedded, the scales were unusable, the escaping was doubled. The tests were green the entire time. The human reviewer found every one of those bugs; the suite found none of them. The responsibility had not transferred to the tooling, however confident it looked.

A few more that surprised me. More money and a bigger token budget do not buy reasoning depth — I measured it: the limit wasn't even binding, the model simply stopped thinking past a certain point, and only a stronger model moved that line. A high benchmark score does not transfer to your setup — a model that tops a coding leaderboard stalled in my loop because it reached for a tool it wasn't allowed instead of the one it was. And two models checking each other is weaker than it sounds: two shallow judgements confirm each other shallowly. What actually catches errors is hard, external feedback — a test, a type checker, a human who looks at the rendered page — not the agents grading themselves.

All of these point the same way. The multiplier scales output, and the confidence that comes with it; it does not scale the judgement that tells you the output is wrong. That judgement stays with you, and the moment you forget it, the speed is working against you.

An operating model you can adopt

So the setup worth copying is mostly a set of decisions, not a stack. Build minimally, hold the full picture in your head, and give every part you defer an explicit trigger for when it earns its place. Tie how much you design up front to how reversible the decision is: a schema, an interface, a name is expensive to change later, so think it through now; almost everything else can be a thin vertical slice you grow. That is what lets you move at the multiplier's speed without it running away from you.

Concretely, the decisions that carry the weight:

— One gateway as the single endpoint; model choice is a one-line routing change, and sovereignty is enforced by which routes exist at all.
— A reasoning tier and an execution tier kept distinct, with an adversarial check between plan and code.
— Every invariant — cost, limits, allowed paths — clamped once at the gateway and unit-tested, never per client.
— Hard tool feedback — tests, linters, type checks, a real look at the output — as the source of truth, above any agent's self-assessment.
— Memory as a versioned, tool-neutral single source of truth, so the context that drives the work is reproducible.

The takeaway

AI is the largest lever I have ever had on my own work, and I use it on everything. But a lever moves the load; it does not decide where the load should go. The model can draft, refactor, explain, and argue with me at three in the morning — and none of that transfers the responsibility for whether the result is correct, sound, and allowed to exist. That stays mine.

That is the whole stance, and it is the difference between an engineer who uses AI and a vibe-coder who is used by it. Take the multiplier — all of it. Just keep your hand on the direction.