llm-opsmulti-agentlitellmeu-sovereign

EU-Sovereign Agentic Coding Environment

A controllable, EU-sovereign environment for agentic coding: LiteLLM tier-routing across on-device and EU-managed providers (any OpenAI-compatible backend swappable in), an adversarial reasoning/execution gate, and a single gateway guardrail.

A working agentic environment that's LLM- and cloud-agnostic: the default routing is fully EU-sovereign — on-device plus EU-managed providers, no US service in the path — with any backend swappable behind one gateway (I adapted it to AWS for one deployment). Deterministic routing and a central cost guardrail, composed from market components instead of bespoke pipelines.

5 model tiers

1 gateway guardrail

Agentic coding tools that route code and context through US-controlled clouds do not fit sovereign or public-sector work. This environment keeps inference in the EU, separates reasoning from execution with an adversarial gate, and clamps cost once at the gateway instead of in every client — composed from market components rather than a bespoke framework.

This page covers the architecture and the deliberate trade-offs. A working setup at mid-to-high maturity — integration over invention.

Reasoning and execution are separate calls gated by an adversarial challenger; both route through one LiteLLM gateway with a single cost guardrail to on-device and EU-managed providers, any OpenAI-compatible backend swappable in.

Background

Sovereign and public-sector work rules out agentic tooling that sends code and context through US-controlled clouds. The environment was built to keep inference in the EU while staying controllable — and to avoid the common failure of letting an agent grade its own work.

Design Decisions

Tier-routing across five model tiers via a single LiteLLM gateway — on-device (Ollama) and EU-managed (Mistral, Scaleway) by default, any OpenAI-compatible backend swappable in — so model choice is a one-line routing decision and inference stays in-region or on-device.

Reasoning is separated from execution by an adversarial 'challenger' gate — a separate call whose job is to push back before code is written, rather than relying on a model to self-reflect. The boundary is explicit: this raises rigor, not model capability.

Cost is clamped once, at the gateway, for every client — one guardrail instead of fixing each consumer. Leverage over repetition.

Memory is a versioned single source of truth, so context is reproducible rather than ambient.

Operational Considerations

Composed from market components (LiteLLM, existing providers) instead of a bespoke orchestration framework — a deliberate integrate-don't-build choice, with an explicit boundary against re-implementing what already exists.

A working environment at mid-to-high maturity — the value is the architecture (sovereign routing, reasoning/execution separation, central guardrail), not a production-scale track record.

Want the full picture behind this system? Get in touch — or see the engineering principles that run through all of them.

Get in touch Engineering principles