ai-governanceworkflow-dsldeterminismaudit

Governance Framework for AI-Agent Delivery

A deterministic framework that makes AI-agent software delivery reproducible, auditable, and approval-gated — a no-LLM kernel, single-shot agents, enforced invariants, and a tamper-evident audit log.

A framework-level system (own workflow DSL, 132 tests, CI gated on determinism) that proved the architecture — and that I then deliberately retired. The governance goals now live in a far smaller, configuration-based setup; building this and choosing to replace it is the over-engineering lesson behind my writing on building with AI.

~14k LOC

132 tests

AI agents in delivery pipelines are non-deterministic, hard to audit, and tend to approve their own work implicitly. This framework inverts that: a kernel with no LLM calls advances a workflow state machine exactly one transition per invocation, agents are single-shot, handoffs happen only through versioned artifacts, and every event is recorded in an append-only, tamper-evident log.

This page covers the architecture and the reasoning behind it. The measurement infrastructure is built in; this is a design and architecture achievement, ahead of measured production runs.

A no-LLM kernel advances the workflow one transition per call through gates that each require an artifact and explicit approval; every event lands in an append-only, tamper-evident audit log.

Background

Letting AI agents drive software delivery raises a governance problem before it raises a capability one: the work must be reproducible, auditable, and explicitly approved at each step — none of which agents provide on their own. The framework treats those properties as architecture, enforced by the system rather than requested of the agents.

Design Decisions

The kernel contains no LLM calls. It advances a declarative workflow state machine exactly one transition per invocation, so progress is deterministic and inspectable. Agents do the open-ended work; the kernel decides what is allowed to happen next.

Agents are single-shot, with artifact-only handoffs. An agent runs once and hands a versioned artifact to the next gate — no hidden loops, no implicit state. 'Why can't an agent just iterate?' is answered explicitly in an anti-FAQ: iteration is a decision that must be visible and approved.

Architectural rules are runtime guards. Violations — an agent trying to loop, an implicit approval — raise typed exceptions rather than being left to convention. Each gate requires both an artifact and an explicit approval to advance.

Every event is recorded in an append-only, tamper-evident audit log (content hashing plus a monotonic counter), so a delivery can be reconstructed and verified after the fact.

Operational Considerations

Framework-level: ~14k lines of Python with 132 tests across 24 test files, CI gated on determinism (a fixed hash seed). The discipline is the point — the system exists to make agent work trustworthy, not faster.

The measurement infrastructure was built in from the start; there are no production-run results, because I retired the system before relying on it in anger — the deliberate call that integrate-don't-build had already won. This is an architecture-and-judgement artefact, not an impact number.

Want the full picture behind this system? Get in touch — or see the engineering principles that run through all of them.

Get in touch Engineering principles