Skip to content

Invariant Review

The Problem: AI-Generated Code You Can’t Review

Section titled “The Problem: AI-Generated Code You Can’t Review”

AI agents are excellent at writing dense Datalog rules. But that creates a new problem: humans are bad at reviewing them.

A booking system might have dozens of derivation rules handling slot availability, cancellation windows, waitlist priority, and conflict resolution. Each rule is correct in isolation. The interactions between them are where bugs hide. Line-by-line review of AI-generated syntax doesn’t catch these — it’s like proofreading a legal contract by checking the grammar.

Traditional code review assumes humans wrote the code and understand the design intent behind each line. When AI generates the rules, that assumption breaks. You didn’t write it. You may not fully understand it. And the AI will happily regenerate the entire rule set if you ask for a change, making your previous review worthless.

JacqOS replaces code review with invariant review. Instead of reading the rules an AI wrote, you declare the constraints those rules must satisfy. The evaluator proves whether they hold.

-- No slot should ever be double-booked
invariant no_double_booking(slot) :-
count booking_confirmed(_, slot) <= 1.
-- Every confirmed booking must have a valid email
invariant confirmed_has_email(req) :-
booking_confirmed(req, _),
booking_request(req, email, _),
email != "".
-- No intent should fire for a cancelled request
invariant no_cancelled_intents(req) :-
intent.reserve_slot(req, _),
not request_cancelled(req).

These are short, declarative, and express intent rather than mechanism. A domain expert can read an invariant and judge whether it captures the right business rule — without understanding a single derivation rule.

An invariant is an integrity constraint declared in .dh that must hold after every evaluation fixed point. If the evaluator reaches a state where an invariant is violated, that evaluation fails.

Invariants use the same syntax as derivation rules. They can reference any relation — atoms, facts, intents — and use negation, aggregates, and helper calls. The difference is semantic: a derivation rule says “derive this when these conditions hold.” An invariant says these conditions must always hold — for every binding of the invariant’s free variables that appears in the current state, the body must succeed. If the body fails for any binding in that parameter domain, the invariant is violated.

-- Structural: every patient has at most one primary contact
invariant single_primary_contact(patient) :-
count primary_contact(patient, _) <= 1.
-- Cross-cutting: every LLM-derived fact must pass through candidate acceptance
invariant llm_facts_accepted(fact_id) :-
derived_from_llm(fact_id),
candidate.accepted(fact_id).
-- Containment: every confirmation intent corresponds to a confirmed booking.
-- Read as: for every (req) with intent.send_confirmation, booking_confirmed(req, _) must hold.
invariant confirmation_intent_has_booking(req) :-
intent.send_confirmation(req, _),
booking_confirmed(req, _).

Invariants are checked after the evaluator reaches its fixed point — after all rules have fired and all fact deltas have been materialized. This means invariants verify the final derived state, not intermediate computation steps.

”Must always hold,” not “must never hold”

Section titled “”Must always hold,” not “must never hold””

The same constraint can be expressed two ways. JacqOS uses the first; some other Datalog dialects use the second. Mixing them is a common bug.

-- Correct under JacqOS semantics: the body MUST ALWAYS hold for every slot.
-- For every slot in the current state, the count is bounded at 1.
invariant no_double_booking(slot) :-
count booking_confirmed(_, slot) <= 1.
-- Wrong framing: this body reads as "for every (slot, r1, r2), there is
-- a double-booking." It is satisfied vacuously when no such triple exists,
-- but the moment a single benign double-booking pair appears in the
-- domain, the body's positive clauses succeed and the invariant succeeds
-- too — the opposite of the intent. Do not write invariants this way.
-- invariant no_double_booking_violation(slot, r1, r2) :-
-- booking_confirmed(r1, slot),
-- booking_confirmed(r2, slot),
-- r1 != r2.

When in doubt, read the body as a query over the parameter domain and ask: “for every binding I care about, does this query succeed?” If the answer should be yes in every healthy state, the body is correctly framed.

jacqos verify checks invariants in two ways:

Your app ships with golden fixtures — .jsonl files containing observation sequences with expected outcomes. jacqos verify replays each fixture from scratch and checks every invariant at every fixed point along the way.

Terminal window
$ jacqos verify
Replaying fixtures...
happy-path.jsonl PASS
contradiction-path.jsonl PASS
Checking invariants...
no_double_booking PASS (427 slots evaluated)
confirmed_has_email PASS (89 bookings evaluated)
no_cancelled_intents PASS (12 intents evaluated)
All checks passed. Digest: sha256:a1b2c3...

Fixture replay catches bugs in known scenarios. But known scenarios are the easy part.

2. Property Testing with Generated Observations

Section titled “2. Property Testing with Generated Observations”

The harder question: do your invariants hold for observation sequences you haven’t thought of?

jacqos verify generates observation sequences drawn from your declared schemas and bounded scenario generators. It explores combinations of valid inputs — different orderings, concurrent requests, edge-case payloads — and checks invariants against each generated sequence.

This is property testing applied to your ontology. The system doesn’t just verify that your happy path works. It searches for any sequence of valid observations that breaks an invariant.

Terminal window
$ jacqos verify
Replaying fixtures...
happy-path.jsonl PASS
contradiction-path.jsonl PASS
Property testing invariants...
no_double_booking PASS (2,500 generated sequences)
confirmed_has_email PASS (2,500 generated sequences)
no_cancelled_intents FAIL
Counterexample found for no_cancelled_intents:
Shrunk to 3 observations (from 47):
1. booking.request {email: "a@b.co", slot_id: "s1"}
2. booking.cancel {request_id: "req_1"}
3. slot.released {slot_id: "s1"}
Violation: intent.reserve_slot("req_1", "s1") derived
but request_cancelled("req_1") is true.
Provenance: rule intents.dh:14 fired on atoms from obs #3,
but did not check cancellation status.

When a generated sequence breaks an invariant, the verifier doesn’t just report the failure. It shrinks the failing sequence to the smallest set of observations that still reproduces the violation.

A generated sequence might contain 47 observations. The shrunk counterexample might be 3. Those 3 observations become a reproducible fixture you can replay, inspect in Studio, and use to drive the AI’s next iteration.

The shrunk counterexample includes:

  • The invariant that was violated
  • The minimal observation sequence that triggers it
  • The provenance chain showing exactly which rule and which observation caused the violation

This is your debugging surface. You don’t read the AI’s rules. You read the counterexample, understand why the invariant should hold, and tell the AI to fix it.

Instead of reading every rule the AI generated, you review:

SurfaceWhat you’re checkingEffort
Invariant declarationsAre these the right constraints? Do they capture what must always be true?Low — invariants are short and declarative
Fixture resultsDoes the system produce the expected output for known inputs?Low — pass/fail with diffs
CounterexamplesDoes this generated failure represent a real problem?Medium — requires domain judgment
Provenance tracesWhen something looks wrong, trace it back to the evidenceOn-demand — only when debugging

Compare this to reviewing 200 lines of generated Datalog rules. Invariants are the same length whether the AI wrote 10 rules or 100. Your review effort scales with the complexity of your requirements, not the complexity of the implementation.

Invariants and golden fixtures are complementary, not interchangeable:

PropertyInvariantGolden Fixture
ScopeAll evaluation statesOne specific scenario
Written byHumans (domain experts)Humans or AI
ChecksUniversal truthSpecific expected output
Survives rule changesYesMay need updating
Catches unknown scenariosYes (via property testing)No

Golden fixtures verify specific scenarios: “given these observations, produce these exact facts.” Invariants verify universal properties: “no matter what observations arrive, this constraint holds.”

Use both. Fixtures prove the system does what you expect in known cases. Invariants prove it doesn’t do what you forbid in any case.

Line-by-line code review of AI-generated rules fails for three reasons:

  1. The AI will regenerate. You review 200 lines today. Tomorrow the AI rewrites the rules based on a new fixture. Your review is invalidated. Invariants survive rule changes because they constrain outcomes, not implementation.

  2. Interaction bugs hide in combinations. Individual rules look correct. The bug is in how rule A interacts with rule B under condition C. Humans are bad at simulating Datalog fixed points in their heads. Property testing explores combinations mechanically.

  3. Review effort scales with the wrong axis. Code review effort scales with implementation complexity. Invariant review effort scales with requirement complexity — which is what you actually need to understand and control.

Invariant review doesn’t ask “is this code correct?” It asks “does this code produce results that satisfy my constraints?” The first question requires understanding the implementation. The second requires understanding the domain.