Replay and Verification

Overview

Replay is how you re-derive the world from recorded observations. Verification is how you prove the derived world is correct. Together they give you a reproducible, auditable proof that your agent logic does what you intend — without reading the generated rules.

This guide covers the mechanics of both and shows how to integrate them into a CI pipeline. It builds on the Fixtures and Invariants guide, which covers writing fixtures and declaring invariants.

How Replay Works

Replay feeds observations through the mapper and evaluator in strict order, producing the full derived state from scratch.

jacqos replay fixtures/happy-path.jsonl

The replay pipeline processes each observation sequentially:

Observation → Mapper → Atoms → Evaluator → Fixed point → Next observation

For each observation:

The mapper extracts atoms — your Rhai mapper transforms the raw observation payload into semantic atoms
The evaluator runs to a fixed point — all .dh rules fire until no new facts, retractions, or intents can be derived
Invariants are checked — every invariant must hold after each fixed point
The next observation is processed — and the cycle repeats

After all observations are processed, the resulting world state contains every derived fact, every fired intent, and the full provenance graph linking them back to their source observations.

Effects During Replay

Live effects do not execute during ordinary fixture replay. When the evaluator derives an intent that would normally trigger an effect (an HTTP call, an LLM completion), replay uses recorded provider captures and observations instead. This is what makes replay deterministic — the same fixture always produces the same world state, regardless of external service availability.

If a fixture contains effect-producing intents but no recorded captures or outcome observations, replay reports the missing effect observations as warnings.

Replay From Checkpoints

For large observation histories, replaying from the beginning can be slow. JacqOS supports checkpoint-based replay:

# Full replay from scratch
jacqos replay fixtures/happy-path.jsonl

# The evaluator can checkpoint intermediate state
# and resume from the last stable checkpoint

Checkpoints store the evaluator’s intermediate state (facts, provenance edges, stratum progress) at a specific observation boundary. Resuming from a checkpoint skips the observations that were already processed. The final world state is identical whether you replay from scratch or from a checkpoint — this is verified by the determinism check in jacqos verify.

Determinism Guarantees

Replay determinism is a hard guarantee, not an aspiration. The same observations, evaluated by the same evaluator, always produce byte-identical derived state.

What this means concretely:

Same	Result
Same observations + same evaluator digest	Byte-identical facts, intents, and provenance
Same observations + different evaluator digest	Different derived state (by design — the rules changed)
Same evaluator digest + different observations	Different derived state (by design — the evidence changed)

What the Determinism Check Verifies

jacqos verify runs every fixture replay twice — once as part of the normal verification pipeline, and once from a clean database. The two runs must produce identical world digests:

Run 1: normal replay → world_digest_a
Run 2: clean-db replay → world_digest_b
assert world_digest_a == world_digest_b

If these diverge, something in the pipeline is non-deterministic — a mapper is using ambient state, a helper has side effects, or the evaluator has a bug. The determinism check catches all of these.

What Contributes to the World Digest

The world digest is a cryptographic hash covering:

Every derived fact (relation name, arguments, assertion/retraction status)
Every derived intent
The evaluator digest that produced them
The observation sequence that was replayed

Two world digests match if and only if the derived state is byte-identical.

The Verification Pipeline

jacqos verify runs ten check families across the selected fixture corpus:

Check	What it verifies
Fixture replay	Every fixture replays without errors
Golden fixtures	Derived state matches expected facts and intents
Invariants	All invariants hold after every fixed point
Candidate-authority lints	Acceptance-gated evidence never skips the required `candidate.*` acceptance boundary
Provenance bundle	Provenance graph is exported for each fixture
Replay determinism	Clean-database replay produces identical world digest
Generated scenarios	Property-tested observation sequences find no violations
Shadow reference	Shadow evaluator agrees with the product evaluator
Secret redaction	No secret material appears in verification artifacts
Composition	Multi-agent namespace composition passes, fails, or is skipped when the app has only one agent-owned namespace

A fixture passes only if every applicable check passes. The overall verification passes only if every fixture passes and every non-skipped global gate stays green.

Shadow Reference Evaluator

The shadow reference evaluator is a second, independent evaluator that processes the same observations and must produce the same derived state. This catches implementation bugs in the primary evaluator — if the shadow disagrees, something is wrong.

The shadow evaluator comparison runs automatically during jacqos verify. You don’t need to configure it.

Generated Scenarios

Beyond replaying your defined fixtures, jacqos verify generates random observation sequences and checks that all invariants hold. When a generated sequence violates an invariant, the verifier shrinks it to a minimal counterexample.

Verification Bundle Format

Every jacqos verify run produces a verification bundle — a JSON artifact containing the complete proof of the verification run. Bundles are written to generated/verification/.

jacqos verify
# => Wrote verification bundle to generated/verification/<app-id>.json

The bundle filename is <app_id>.json, where app_id is the value declared in jacqos.toml. For an app with app_id = "jacqos-appointment-booking", the bundle lands at generated/verification/jacqos-appointment-booking.json.

Bundle Structure

{
  "version": "jacqos_verify_v1",
  "app_id": "my-booking-app",
  "evaluator_digest": "sha256:a1b2c3...",
  "prompt_bundle_digest": "sha256:d4e5f6...",
  "llm_complete_active": false,
  "status": "passed",
  "composition_analysis_path": "generated/verification/composition-analysis-sha256-<digest>.json",
  "composition_analysis": { ... },
  "summary": { ... },
  "checks": [ ... ],
  "redaction_findings": [],
  "fixtures": [ ... ]
}

Top-Level Fields

Field	Description
`version`	Bundle format version (`jacqos_verify_v1`)
`app_id`	Application identifier from `jacqos.toml`
`evaluator_digest`	Hash of the ontology IR, mapper semantics, and helper digests
`prompt_bundle_digest`	Hash of prompt files (present only if prompts exist)
`llm_complete_active`	Whether the `llm.complete` capability is declared
`status`	`passed`, `failed`, or `skipped`
`composition_analysis_path`	Relative path to the companion composition-analysis report when the composition check passed
`composition_analysis`	Embedded composition-analysis artifact when the composition check passed
`summary`	Aggregate counts and fixture-level summaries
`checks`	The verification checks with passed/failed/skipped status and detail text
`redaction_findings`	Any secret material detected in artifacts
`fixtures`	Per-fixture verification artifacts

The persisted bundle intentionally omits wall-clock timestamps and durations so re-running verification does not create noisy diffs in checked-in proof artifacts. Use jacqos export benchmark-report when you need runtime timings.

When the composition gate passes, jacqos verify also writes generated/verification/composition-analysis-sha256-<evaluator_digest>.json and embeds the same portable report into the bundle. That report is static with respect to store history: it depends on the ontology and fixture corpus, not on the current SQLite state.

Per-Fixture Artifacts

Each entry in the fixtures array contains the complete verification evidence for one fixture:

Field	Description
`fixture`	Relative path to the fixture file
`status`	`passed` or `failed`
`observation_digest`	Hash of the observation sequence
`world_digest`	Hash of the derived world state
`replay`	Replay summary (observation, atom, fact, intent counts)
`golden`	Golden fixture comparison (expected vs. actual)
`determinism`	Determinism check result
`shadow`	Shadow evaluator conformance result
`generated_scenarios`	Property testing results
`invariant_failures`	Detailed invariant violation reports
`provenance_graph`	Full provenance graph export

The provenance graph in each fixture artifact contains every derivation edge — from observations to atoms to facts to intents. This is the same data Studio surfaces in the drill inspector and timeline. Visual graph rendering ships in V1.1.

Verifying a Pinned Composition Report

For multi-agent apps, the composition gate produces an auditable artifact you can pin in source control. Re-run jacqos verify with --composition-report to confirm the pinned report still matches the current ontology and fixture corpus:

jacqos verify --composition-report \
  generated/verification/composition-analysis-sha256-<evaluator_digest>.json

The flag reuses the existing report as the expected baseline. The verify run regenerates the composition analysis from the current sources and fails if the regenerated artifact diverges from the pinned report. Use it when you want one command to confirm both that the app passes verification and that no agent-owned namespace, cross-namespace dependency, or invariant-coverage value has shifted since the report was checked in.

The report is static with respect to store history. It records:

Namespace reduct partitions — which relations belong to which agent-owned namespace
Cross-namespace dependencies — every edge that crosses a namespace boundary, with monotonicity labels
Namespace-cycle severity — whether any cross-namespace cycle violates the composition contract
Invariant fixture coverage — which invariants are exercised by which fixtures

Pin the report whenever a multi-agent app reaches a known-good shape. Any future change that perturbs the composition surface will fail verification with a precise diff against the pinned baseline. Apps with zero or one agent-owned namespace skip this gate; for them, --composition-report is unnecessary.

Standalone generation is also available: jacqos composition check --report <path> writes the report without running the rest of jacqos verify, and jacqos composition verify-report <report> validates a pinned report on its own. See the debugging workflow for the full inspection loop.

CI Integration

Verification bundles are designed for CI pipelines. The jacqos verify exit code tells your CI whether the build passes:

Exit code	Meaning
`0`	All checks passed
`2`	Verification failures (fixture, invariant, or determinism)
`1`	Other error (missing fixtures, configuration issue)

Basic CI Setup

# GitHub Actions example
name: Verify
on: [push, pull_request]

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install JacqOS
        run: |
          curl -fsSL https://www.jacqos.io/install.sh | sh

      - name: Verify
        run: jacqos verify

      - name: Upload verification bundle
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: verification-bundle
          path: generated/verification/

What to Store as CI Artifacts

Store the verification bundle (generated/verification/*.json) as a build artifact. It contains everything needed to understand what was verified and whether it passed:

The evaluator digest — which version of the rules was tested
Per-fixture world digests — the exact derived state for each scenario
Invariant check results — which invariants were exercised and how many values were tested
Counterexamples — any generated scenarios that found violations
Redaction audit — proof that no secrets leaked into artifacts

Verification as a PR Gate

Use the verification digest as a merge gate. Two PRs that produce the same evaluator digest and verification digest are semantically equivalent — they derive the same facts from the same observations.

      - name: Verify and check digest
        run: |
          jacqos verify
          # The bundle includes the evaluator digest and per-fixture world digests
          # Your review process can compare these against the base branch

Comparing Across Branches

To understand what a code change does to derived state:

Run jacqos verify on the base branch — save the verification bundle
Run jacqos verify on the feature branch — save the verification bundle
Compare the evaluator digests — if they match, the semantic behavior is identical
If they differ, compare per-fixture world digests to see which scenarios changed

This is the CI equivalent of the Activity Compare lens coming in V1.1 — same concept, machine-readable format.

Verifying Live Ingress

Live ingress should still end in a fixture proof. A serve run gives you operational handles such as run_id, SSE event_id, and adapter receipts, but those are not the semantic contract. The contract is the observation sequence and the derived model.

For a live path, keep the replay loop explicit:

jacqos observe --jsonl fixtures/shared-reality.jsonl --lineage live-demo --create-lineage --json
jacqos run --lineage live-demo --once --shadow --json
jacqos replay fixtures/shared-reality.jsonl
jacqos verify

When you convert a chat session, webhook delivery, or multi-agent subscriber scenario into a fixture, preserve the observations that crossed the mapper boundary. Do not assert on local run_id values or SSE event ids. Assert on facts, intents, invariant violations, contradictions, and effect receipts.

This is what makes live debugging and CI agree: Studio can inspect the live serve surfaces, while jacqos verify proves the same behavior from a clean observation history.

Exporting Verification Bundles

You can export the verification bundle separately from running verification:

# Run verification (always writes to generated/verification/)
jacqos verify

# Export as a standalone artifact
jacqos export verification-bundle

The exported bundle is the same JSON artifact that jacqos verify writes. The export subcommand is useful when you want to ship the bundle to a different location or system.

Worked Example: CI for Appointment Booking

Here’s a complete CI workflow for the appointment-booking app:

name: Appointment Booking Verification
on:
  push:
    paths:
      - 'ontology/**'
      - 'mappings/**'
      - 'helpers/**'
      - 'fixtures/**'
      - 'jacqos.toml'

jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install JacqOS
        run: curl -fsSL https://www.jacqos.io/install.sh | sh

      - name: Replay happy path
        run: jacqos replay fixtures/happy-path.jsonl

      - name: Replay contradiction path
        run: jacqos replay fixtures/contradiction-path.jsonl

      - name: Full verification
        run: jacqos verify

      - name: Upload bundle
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: verification-bundle
          path: generated/verification/

The replay steps are optional — jacqos verify runs all fixtures automatically. Running them separately gives you per-fixture timing and output in the CI log.

Going Deeper

When a verification run fails, your next step is the debugging loop. When it passes, your next step is locking the result in.

Debugging Workflow — the end-to-end loop for turning a failed jacqos verify into a fix, including provenance drill-downs and pinned composition-report inspection
Debugging with Provenance — tracing facts and intents back to the exact observations that produced them
Golden Fixtures — concept deep-dive on digest-backed behavior proof
Invariant Review — universal constraints that hold across all evaluation states
Fixtures and Invariants — practical guide to writing fixtures and declaring invariants
Evaluation Package — the portable contract boundary that ships with each verified app
CLI Reference — every flag on jacqos replay, jacqos verify, jacqos composition, and jacqos export