Golden Fixtures
The Problem: How Do You Trust AI-Generated Rules?
Section titled “The Problem: How Do You Trust AI-Generated Rules?”Invariants rule out certain bad states for the models your evaluator admits. But they don’t show your system does the right thing — only that it avoids the wrong thing you declared. You still need evidence that a specific sequence of observations produces the specific facts, intents, and effects your domain requires.
Golden fixtures are that evidence. A golden fixture is a deterministic input timeline paired with an expected world state. You define the exact observations that enter the system and the exact derived state that should result. If the evaluator output matches byte-for-byte, you have cryptographic evidence that this evaluator produces the expected behavior for that scenario.
Deterministic Input Timelines
Section titled “Deterministic Input Timelines”A fixture is a JSONL file. Each line is either an observation (input) or an expectation (output). Together they define a complete scenario — a deterministic input timeline.
Happy Paths
Section titled “Happy Paths”Happy paths show the system does what it should when everything goes right:
{"kind": "booking.request", "payload": {"email": "pat@example.com", "slot_id": "slot-42", "patient_name": "Pat"}}{"kind": "slot.status", "payload": {"slot_id": "slot-42", "is_available": true}}{"kind": "reserve.result", "payload": {"request_id": "req-1", "slot_id": "slot-42", "succeeded": true}}Error Paths
Section titled “Error Paths”Error paths show the system handles failures and conflicts correctly. These are not optional — every flagship app ships contradiction-path fixtures:
{"kind": "booking.request", "payload": {"email": "pat@example.com", "slot_id": "slot-42"}}{"kind": "booking.request", "payload": {"email": "sam@example.com", "slot_id": "slot-42"}}{"kind": "slot.status", "payload": {"slot_id": "slot-42", "is_available": true}}The expected output should show that only one booking succeeds and the no_double_booking invariant holds. Contradiction paths catch interaction bugs that happy paths miss.
A typical app fixture directory:
fixtures/ happy-path.jsonl # Normal booking flow contradiction-path.jsonl # Conflicting observations cancellation-path.jsonl # Mid-flow cancellation llm-extraction.jsonl # LLM-assisted intakeExpected World State
Section titled “Expected World State”Each fixture also declares what the evaluator should produce — the expected world state after all observations have been replayed and the evaluator reaches its fixed point:
{"expect_fact": "booking_confirmed", "args": ["req-1", "slot-42"]}{"expect_fact": "slot_reserved", "args": ["slot-42"]}{"expect_no_fact": "slot_available", "args": ["slot-42"]}{"expect_intent": "intent.send_confirmation", "args": ["req-1", "pat@example.com"]}Expectations can assert:
- Facts that must exist —
expect_fact - Facts that must not exist —
expect_no_fact - Intents that must be derived —
expect_intent
The expected world state is the specification. The .dh rules are the implementation. If the evaluator output matches the expected state, the implementation satisfies the specification for that scenario.
The AI Feedback Loop
Section titled “The AI Feedback Loop”Fixtures create a tight, automated feedback loop for AI agents:
- Human defines fixtures — observation sequences and expected outputs
- AI generates
.dhrules — ontology derivations, intents, helpers jacqos replayruns the fixture — evaluator processes observations- Output compared to expectations — byte-identical match required
- AI iterates if mismatch — adjusts rules based on diff
- When all fixtures pass and all invariants hold — the rules satisfy the fixture corpus and declared invariants for this evaluator
The human never needs to read the generated rules. The fixtures are the specification; the rules are the implementation detail. The AI keeps iterating until the output matches exactly.
How jacqos verify Checks Fixture Conformance
Section titled “How jacqos verify Checks Fixture Conformance”jacqos verify replays every fixture from scratch on a clean database, checks the evaluator output against expectations, and verifies all invariants at every fixed point:
$ jacqos verifyReplaying fixtures... happy-path.jsonl PASS (3 observations, 2 facts matched) contradiction-path.jsonl PASS (3 observations, 1 fact matched) cancellation-path.jsonl PASS (4 observations, 3 facts matched) llm-extraction.jsonl PASS (5 observations, 4 facts matched)
Checking invariants... no_double_booking PASS (427 slots evaluated) confirmed_has_email PASS (89 bookings evaluated) no_cancelled_intents PASS (12 intents evaluated)
All checks passed. Digest: sha256:a1b2c3d4e5f6...Each replay is deterministic. The same observations, the same evaluator, the same rules produce the same facts every time. If anything changes — a rule, a mapper, a helper — the digest changes.
When a fixture fails, the output shows exactly what diverged:
$ jacqos verifyReplaying fixtures... happy-path.jsonl FAIL
Expected: booking_confirmed("req-1", "slot-42") Got: (not derived)
Missing facts: 1 Unexpected facts: 0
Hint: rule rules.dh:23 did not fire. Provenance: no atom matched booking_request(_, "slot-42", _)Digest-Backed Evidence
Section titled “Digest-Backed Evidence”When jacqos verify passes, it produces a verification digest — a cryptographic hash that attests to exact behavior:
The digest covers:
- Evaluator identity — hash of ontology rules, mapper semantics, and helper digests
- Fixture corpus — hash of every
.jsonlfixture file - Derived state — byte-identical facts, intents, and provenance for each fixture
Verification digest: sha256:a1b2c3d4e5f6... evaluator_digest: sha256:7890ab... fixture_corpus: sha256:cdef01... derived_state: sha256:234567...This digest is portable. It travels with your evaluation package and can be independently verified. Anyone with the same evaluator and fixture corpus can reproduce the exact same digest. If they can’t, something changed.
This is not just a test report. It is cryptographic evidence that a specific evaluator, given specific inputs, produced specific outputs. The evidence is only as strong as the fixtures and expectations you defined — but for those fixtures, it is exact.
Limitations
Section titled “Limitations”Golden fixtures provide evidence for defined inputs, not blanket evidence for all possible inputs.
What fixtures show:
- For the exact observation sequences in your fixture corpus, the evaluator produces the exact expected world state
- The evidence is reproducible and cryptographically verifiable
- Any change to rules, mappers, or helpers that affects fixture outcomes will be detected
What fixtures do not show:
- That the system behaves correctly for observation sequences not in the corpus
- That the fixture corpus covers all important scenarios
- That the expected world state itself is correct (a fixture with wrong expectations will still pass)
Fixtures are scenario-level contracts. They answer: “given these specific observations, does the system produce this specific result?” They do not answer: “does the system behave correctly for all valid observations?”
For universal properties, use invariants. Invariants hold across all evaluation states produced by the fixed evaluator, not just fixture scenarios. The combination of golden fixtures (specific scenario evidence) and invariant review (universal constraints over the evaluated model) gives you both targeted evidence and broad safety boundaries.
| Property | Golden Fixture | Invariant |
|---|---|---|
| Scope | One specific scenario | All evaluation states |
| Shows | Exact expected output | Universal constraint holds |
| Catches unknown scenarios | No | Yes (via property testing) |
| Cryptographic digest | Yes | Yes (within verify) |
| Survives rule changes | May need updating | Yes |
Next Steps
Section titled “Next Steps”- Invariant Review — universal constraints that hold across all states
- Visual Provenance — tracing facts back to evidence when fixtures fail
- Fixtures and Invariants Guide — practical guide to writing fixtures
- CLI Reference —
jacqos verifyandjacqos replaycommands - Getting Started — try it yourself