← Back to Work
ai-governanceworkflow-dsldeterminismaudit

Governance Framework for AI-Agent Delivery

A deterministic framework that makes AI-agent software delivery reproducible, auditable, and approval-gated — a no-LLM kernel, single-shot agents, enforced invariants, and a tamper-evident audit log.

A framework-level system (own workflow DSL, 41 test files, CI gated on determinism) that proves the architecture. The measurement is built in, ahead of production runs.

~14k LOC
41 test files

AI agents in delivery pipelines are non-deterministic, hard to audit, and tend to approve their own work implicitly. This framework inverts that: a kernel with no LLM calls advances a workflow state machine exactly one transition per invocation, agents are single-shot, handoffs happen only through versioned artifacts, and every event is recorded in an append-only, tamper-evident log.

This page covers the architecture and the reasoning behind it. The measurement infrastructure is built in; this is a design and architecture achievement, ahead of measured production runs.

Deterministic kernel — no LLM calls, one transition per invocation Draft Review Done gate gate each gate requires an artifact + explicit approval append-only, tamper-evident audit log (SHA-256, monotonic counter)

A no-LLM kernel advances the workflow one transition per call through gates that each require an artifact and explicit approval; every event lands in an append-only, tamper-evident audit log.

Background

Letting AI agents drive software delivery raises a governance problem before it raises a capability one: the work must be reproducible, auditable, and explicitly approved at each step — none of which agents provide on their own. The framework treats those properties as architecture, enforced by the system rather than requested of the agents.

Design Decisions

The kernel contains no LLM calls. It advances a declarative workflow state machine exactly one transition per invocation, so progress is deterministic and inspectable. Agents do the open-ended work; the kernel decides what is allowed to happen next.

Agents are single-shot, with artifact-only handoffs. An agent runs once and hands a versioned artifact to the next gate — no hidden loops, no implicit state. 'Why can't an agent just iterate?' is answered explicitly in an anti-FAQ: iteration is a decision that must be visible and approved.

Architectural rules are runtime guards. Violations — an agent trying to loop, an implicit approval — raise typed exceptions rather than being left to convention. Each gate requires both an artifact and an explicit approval to advance.

Every event is recorded in an append-only, tamper-evident audit log (content hashing plus a monotonic counter), so a delivery can be reconstructed and verified after the fact.

Operational Considerations

Framework-level: ~14k lines of Python across 41 test files, with CI gated on determinism (a fixed hash seed). The discipline is the point — the system exists to make agent work trustworthy, not faster.

The measurement infrastructure exists; there are no production-run results to cite yet. This is an architecture and design achievement, not an impact number.

Want the full picture behind this system? Get in touch — or see the engineering principles that run through all of them.