← Back to Work
sap-hanacode-generationcompilertesting

Bidirectional SAP-HANA Warehouse Engine

A spec-driven engine that generates SAP-HANA warehouse objects from YAML and parses existing ones back — bound by a shared intermediate representation and byte-stable roundtrip tests.

A ~24k-LOC engine (439 tests) where spec → generate → parse round-trips byte-for-byte. The staging and core layers are built and tested; the data-mart layer is next on the roadmap.

~24k LOC
439 tests
byte-stable roundtrips

Hand-authoring SAP-HANA data-warehouse objects is repetitive, and over time the spec people reason about drifts away from the code that is actually deployed. This engine makes the spec and the code two views of one representation — generate one way, parse the other — so they cannot drift. It is deterministic by construction: the same spec always produces the same bytes.

This page covers the architecture and the engineering decisions, not the implementation. The staging and core layers are built and tested; the data-mart layer is on the roadmap.

generate  (spec → objects) parse  (objects → spec) YAML spec authored once Shared IR one representation read + write bound HANA objects CDS · procedures · views spec → generate → parse → byte-identical

Read-write symmetry: a YAML spec and the generated HANA objects both bind to one shared intermediate representation. Generate goes spec → objects; parse goes objects → spec; roundtrips are byte-identical.

Background

SAP-HANA warehouse development is largely manual — CDS entities, SQLScript procedures, and calculation views authored by hand across staging, core, and data-mart layers. Two failure modes compound over time: the work is repetitive and error-prone, and the spec drifts away from the deployed code. The goal was an engine that treats the spec as the single source of truth and keeps generation and parsing structurally in sync.

Design Decisions

Read and write share one intermediate representation. Parsers (read) and code generators (write) operate on the same typed IR, so a new spec field cannot be added without touching its type, its parser, its generator, and a roundtrip test together. The symmetry is an enforced invariant, not a convention.

Determinism is verified, not assumed. Output ordering is forced (sorted, hash-seed-independent) and golden-master plus byte-stable roundtrip tests assert that the same spec produces identical bytes every run — the property that makes the engine safe to trust in a change-controlled environment.

The engine is scoped deliberately. It generates and parses warehouse objects; it does not guess business logic, schedule jobs, write documentation, or touch production. Defining the system by what it will not do keeps it predictable.

Operational Considerations

~24k lines of Python with 439 tests. The core (staging + core layers) is built and tested; the data-mart layer is on the roadmap. The value is the architecture and the enforced invariants, demonstrated on the layers that are done.

The parser core was reused, not rebuilt: it grew out of a separate platform-wide lineage-extraction tool that parses many existing repositories end to end. Reusing a proven component across repository boundaries is part of the design, not an afterthought.

Want the full picture behind this system? Get in touch — or see the engineering principles that run through all of them.