Incident Response Walkthrough
What You’ll Build
Section titled “What You’ll Build”A flagship example that models cloud incident response on a service dependency graph. When a primary database degrades, triage derives the blast radius through recursive Datalog over the real topology, communications and remediation agents react through the shared derived model rather than a hidden orchestration graph, and catastrophic invariants stop unsafe plans before they become effects.
This walkthrough is the cleanest demonstration of the multi-namespace coordination pattern with recursive derivation:
topology.update + telemetry.alert -> infra.transitively_depends (recursive closure) -> triage.blast_radius -> proposal.remediation_action -> remediation.plan -> intent.notify_stakeholder, intent.remediateIt covers the full JacqOS pipeline with a focus on stigmergic coordination through invariants:
- Observations arrive as JSON events (
topology.update,telemetry.alert,llm.remediation_decision_result,effect.receipt) - Mappers extract trusted topology atoms and
requires_relaysemantic atoms from the model output - Rules derive recursive transitive dependencies, blast radius, root-cause severity, and proposal-relayed remediation plans
- Invariants enforce catastrophic guards —
no_kill_unsynced_primary,always_have_admin,no_isolate_healthy - Intents derive stakeholder notifications and bounded remediation calls
- Fixtures prove the happy path, an unsafe-plan contradiction, a deep cascade, and full coverage of the safety boundary
Project Structure
Section titled “Project Structure”jacqos-incident-response/ jacqos.toml ontology/ schema.dh # 4 namespaces of relations rules.dh # 3 strata + 3 catastrophic invariants intents.dh # notify + remediate intent derivation mappings/ inbound.rhai # mapper contract + observation mapping prompts/ remediation-system.md # remediation prompt bundle schemas/ remediation-action.json # structured-output schema fixtures/ happy-path.jsonl happy-path.expected.json contradiction-path.jsonl contradiction-path.expected.json cascade-path.jsonl cascade-path.expected.json coverage-path.jsonl coverage-path.expected.json generated/ ... # verification, graph, and export artifactsStep 1: Configure The App
Section titled “Step 1: Configure The App”jacqos.toml declares the app identity, the notification API binding, the remediation model binding, and recorded replay for both:
app_id = "jacqos-incident-response"app_version = "0.1.0"
[paths]ontology = ["ontology/*.dh"]mappings = ["mappings/*.rhai"]prompts = ["prompts/*.md"]schemas = ["schemas/*.json"]fixtures = ["fixtures/*.jsonl"]helpers = ["helpers/*.rhai"]
[capabilities]http_clients = ["notify_api"]models = ["remediation_model"]timers = falseblob_store = true
[capabilities.intents]"intent.notify_stakeholder" = { capability = "http.fetch", resource = "notify_api" }"intent.remediate" = { capability = "llm.complete", resource = "remediation_model", result_kind = "llm.remediation_decision_result" }
[resources.http.notify_api]base_url = "https://incident-notify.example.invalid"credential_ref = "NOTIFY_API_TOKEN"replay = "record"
[resources.model.remediation_model]provider = "openai"model = "gpt-4o-mini"credential_ref = "OPENAI_API_KEY"replay = "record"The remediation model is wired through the same provider-capture path as the HTTP notifier. The bundled fixtures are fully deterministic — jacqos verify produces the same facts every run, no API key needed when matching captures are present, and the same seam can be flipped between record and replay without any ontology change.
Step 2: Declare Relations
Section titled “Step 2: Declare Relations”The schema partitions relations across four namespaces — infra.* for topology and telemetry, triage.* for blast-radius reasoning, proposal.* and remediation.* for the model-relayed action surface, and intent.* for declared external effects:
relation infra.service(service_id: text)relation infra.depends_on(service_id: text, dependency_id: text)relation infra.transitively_depends(service_id: text, dependency_id: text)relation infra.health_signal(service_id: text, status: text, seq: int)relation infra.degraded(service_id: text)relation infra.healthy(service_id: text)relation infra.is_primary_db(service_id: text)relation infra.replica_synced(service_id: text)relation infra.production_system(service_id: text)relation infra.has_admin_access(service_id: text)relation infra.admin_gap(service_id: text)
relation triage.blast_radius(service_id: text, root_service: text)relation triage.impacted(service_id: text)relation triage.root_cause(root_service: text)relation triage.severity(root_service: text, severity: text)relation triage.stakeholder_notified(root_service: text)
relation proposal.remediation_action( decision_id: text, root_service: text, target_service: text, action: text, seq: int)relation remediation.plan(root_service: text, target_service: text, action: text, seq: int)relation remediation.unsafely_scaled_primary(service_id: text)relation remediation.unsafely_isolated(service_id: text)
relation intent.notify_stakeholder(root_service: text, severity: text)relation intent.remediate(root_service: text, severity: text)The crucial separation is between proposal.remediation_action (whatever the model said) and remediation.plan (what passed the relay boundary). The ontology keeps these on opposite sides of the requires_relay gate so an absurd plan never silently becomes an executable action.
Step 3: Map Observations To Atoms
Section titled “Step 3: Map Observations To Atoms”The mapper marks the LLM remediation output as requires_relay into proposal.*. Topology and telemetry stay trusted atoms; only the model’s structured action lands behind the relay namespace:
fn mapper_contract() { #{ requires_relay: [ #{ observation_class: "llm.remediation_decision_result", predicate_prefixes: ["proposal."], relay_namespace: "proposal", } ], }}map_observation() then projects topology, telemetry, model output, and effect receipts:
if obs.kind == "topology.update" { let atoms = [atom("service.id", body.service_id)]; if body.contains("depends_on") { for dependency in body.depends_on { atoms.push(atom("service.depends_on", dependency)); } } if body.contains("is_primary_db") { if body.is_primary_db == true { atoms.push(atom("service.primary_db", body.service_id)); } } if body.contains("replica_synced") { if body.replica_synced == true { atoms.push(atom("service.replica_synced", body.service_id)); } } return atoms;}
if obs.kind == "llm.remediation_decision_result" { return [ atom("proposal.id", body.decision_id), atom("proposal.root_service", body.root_service), atom("proposal.target_service", body.target_service), atom("proposal.action", body.action), atom("proposal.seq", body.seq), ];}The split is the whole design: topology is trusted structure, telemetry is trusted signal, and the model’s remediation action is fallible interpretation that has to clear the proposal boundary before any rule can derive remediation.plan.
Step 4: Derive Blast Radius, Severity, And Plans
Section titled “Step 4: Derive Blast Radius, Severity, And Plans”Recursive transitive closure computes blast radius from the dependency graph rather than from hand-authored runbooks:
rule infra.transitively_depends(service, dependency) :- infra.depends_on(service, dependency).
rule infra.transitively_depends(service, root) :- infra.depends_on(service, dependency), infra.transitively_depends(dependency, root).
rule triage.root_cause(root) :- infra.degraded(root), not infra.healthy(root).
rule triage.blast_radius(root, root) :- triage.root_cause(root).
rule triage.blast_radius(service, root) :- infra.transitively_depends(service, root), triage.root_cause(root).When a degraded primary appears, every transitively dependent service joins the blast radius automatically. The cascade fixture exercises a five-service chain to prove the closure is depth-faithful.
Severity is a small projection over the root cause:
rule triage.severity(root, "critical") :- triage.root_cause(root), infra.is_primary_db(root).
rule triage.severity(root, "high") :- triage.root_cause(root), not infra.is_primary_db(root).The model’s proposal lifts into proposal.remediation_action, then a single bridge rule promotes it into remediation.plan only if it cleared the relay boundary:
rule assert proposal.remediation_action(decision_id, root, target, action, seq) :- atom(obs, "proposal.id", decision_id), atom(obs, "proposal.root_service", root), atom(obs, "proposal.target_service", target), atom(obs, "proposal.action", action), atom(obs, "proposal.seq", seq).
rule remediation.plan(root, target, action, seq) :- proposal.remediation_action(_, root, target, action, seq).The catastrophic boundary is a pair of unsafe-condition relations and the named invariants that forbid them:
rule remediation.unsafely_scaled_primary(node) :- remediation.scale_down(node), infra.is_primary_db(node), not infra.replica_synced(node).
rule remediation.unsafely_isolated(service) :- remediation.isolate(service), not triage.impacted(service).
invariant no_kill_unsynced_primary(node) :- count remediation.unsafely_scaled_primary(node) <= 0.
invariant no_isolate_healthy(service) :- count remediation.unsafely_isolated(service) <= 0.
invariant always_have_admin(service) :- count infra.admin_gap(service) <= 0.These three invariants are the structural backstop. Even if a future rule edit weakens the planning layer, an unsafe scale-down of an unsynced primary, an isolate of a healthy service, or a production system without admin access still trips invariant review and jacqos verify halts.
Step 5: Derive Outbound Effects Only From Stable State
Section titled “Step 5: Derive Outbound Effects Only From Stable State”Communications and remediation are independent intent rules over the shared model. There is no shared workflow graph — each agent reads what it needs from triage.* and contributes its declared intent:
rule intent.notify_stakeholder(root, severity) :- triage.root_cause(root), triage.severity(root, severity), not triage.stakeholder_notified(root).
rule intent.remediate(root, severity) :- triage.root_cause(root), triage.severity(root, severity), not remediation.plan(root, _, _, _).Two agents, one shared truth surface, zero orchestration code. That is stigmergic coordination — the same pattern as ant trails, but typed and inspectable.
Step 6: Fixtures
Section titled “Step 6: Fixtures”This example ships four fixtures, each exercising a different facet of the pipeline.
Happy path
Section titled “Happy path”A degraded primary fans out through the dependency chain, stakeholders are notified, and a safe reroute to the in-sync replica is proposed and accepted. Final state: one root cause, one blast radius, one applied remediation, all three invariants hold.
Contradiction path
Section titled “Contradiction path”The remediation model proposes a scale_down against the primary while no synced replica exists. remediation.unsafely_scaled_primary derives, no_kill_unsynced_primary fires, and the unsafe plan never reaches an effect. The fixture also exercises a retracted telemetry signal so the timeline shows both invariant containment and contradiction handling in the same window.
Cascade path
Section titled “Cascade path”A five-service chain (cdn-edge -> frontend-web -> edge-api -> auth-service -> db-primary) exercises deep transitive closure. The model proposes isolating auth-service, the rule confirms the service is in the blast radius, and no_isolate_healthy does not fire because the isolate target is genuinely impacted.
Coverage path
Section titled “Coverage path”A timeline that walks every accepting and rejecting branch of the rule graph so the verification bundle’s coverage report reaches 100% on the rule shape. The same coverage data is consumed today by jacqos verify and is exported in every verification bundle under generated/verification/.
What You’ll See In Studio
Section titled “What You’ll See In Studio”Open the demo with jacqos studio --lineage incident-response and the bundled happy-path fixture loads. Switch fixtures from the timeline picker to walk every scenario:
- Safe reroute -> the
Donetab showsdb-primary -> reroute, applied. Drill in and the inspector takes you from the executed remediation back throughremediation.plan, the model’sproposal.remediation_action, the blast-radius derivation, and the original telemetry alert. - Unsafe scale-down blocked -> the
Blockedtab shows theno_kill_unsynced_primaryinvariant violation. The drill inspector names the missinginfra.replica_syncedfact and the model’s proposal that triggered the unsafe condition. No effect ever fired. - Five-service cascade -> the
Donetab shows the isolate applied toauth-service. Drill into the blast radius and the inspector walks the recursiveinfra.transitively_dependschain back todb-primary. - Stakeholder notified -> a notification effect shows up in
Donefor every fixture; this is the second agent participating through the shared model with no orchestration glue.
Why This Example Matters
Section titled “Why This Example Matters”This is the multi-agent coordination pattern in its strongest form:
- two independent agents (notify and remediate) coordinate without a shared workflow
- recursive Datalog computes blast radius from real topology, not hand-authored playbooks
- catastrophic invariants are a structural backstop that survives any rule-edit regression
- the model’s plan is visible, queryable, and forced to clear the proposal boundary before it can become an effect
That is how you stop a confidently wrong remediation plan from terminating a primary database during the worst hour of your year.
Make It Yours
Section titled “Make It Yours”The incident-response pattern fits every domain where multiple agents read shared state and contribute to a single outcome under safety constraints:
- Production database operations — propose backup, restore, or failover plans; gate by replica sync state and snapshot freshness
- Kubernetes orchestration — propose pod terminations or node drains; gate by quorum, leader election, and PDB compliance
- Financial trading kill-switches — propose order cancellations or position closes; gate by exposure limits and counterparty status
- Industrial control loops — multiple sensor agents feed a shared model; actuator agents read the model and respect named safety invariants
To start building, scaffold a starter app:
jacqos scaffold --pattern multi-agent my-incident-appNext Steps
Section titled “Next Steps”- Multi-Agent Patterns — the namespace partitioning + stigmergic coordination story this example demonstrates
- Smart Farm Walkthrough — a smaller multi-agent example with a single named-invariant safety boundary
- LLM Decision Containment — the pattern page that explains the
proposal.*boundary the remediation model passes through - Invariant Review — how named invariants replace code review of generated rules
- Observation-First Thinking — the underlying evidence-first mental model