LLM Agents
LLM calls in JacqOS are not black boxes. Every model interaction is a declared effect with full provenance — the prompt, the response, and every fact derived from it are traceable and replayable. This guide shows how to build agents that use LLMs safely, using the medical-intake example as a running illustration.
For the broader product framing of why LLM-proposed actions must route through a domain decision rule, see LLM Decision Containment. For the authoring mechanics of proposal.* relays, see Action Proposals.
Candidates and proposals
Section titled “Candidates and proposals”JacqOS enforces a mandatory rule: model output must relay through the correct trust boundary before it becomes accepted fact or executable intent. LLM outputs are inherently probabilistic. Descriptive output belongs behind candidate.*. Action suggestions belong behind proposal.*.
For descriptive extraction flows, the candidate pattern has three stages:
- LLM output lands as candidates. The mapper extracts LLM results into
candidate.*relations — never directly into accepted facts. - Evidence gates promotion. Ontology rules promote candidates to accepted facts only when explicit evidence exists (clinician approval, threshold checks, corroborating data).
- Invariants enforce the boundary. Named invariants assert the integrity properties the system must preserve once candidates have been promoted.
Here’s how the medical-intake example implements this:
ontology/schema.dh — candidates and accepted facts are separate relations:
relation candidate.conditions(intake_id: text, condition: text, extraction_seq: text)relation candidate.medications(intake_id: text, medication: text, extraction_seq: text)relation accepted_conditions(intake_id: text, condition: text)relation accepted_medications(intake_id: text, medication: text)ontology/rules.dh — LLM results land as candidates:
rule assert candidate.conditions(id, condition, seq) :- atom(obs, "extraction.intake_id", id), atom(obs, "extraction.condition", condition), atom(obs, "extraction.seq", seq).Candidates are promoted only after clinician approval:
rule accepted_conditions(id, condition) :- candidate.conditions(id, condition, _), clinician_approved(id).A named invariant captures the post-acceptance integrity property the application cares about. The medical-intake example uses this one to make finalization safe:
invariant no_finalize_without_review(id) :- intake_finalized(id), clinician_approved(id).Invariant semantics. A .dh invariant body must always hold for every binding of its declared parameters that appears in the current model. After every evaluation fixed point the evaluator computes the parameter domain and checks the body succeeds for each binding; any failing binding is a violation that rejects the transition. So the invariant above reads as: “for every finalized intake, clinician_approved must also hold.” See the .dh Language Reference and Invariant Review for the full semantics.
The relay-boundary rule itself is not an invariant — it is a load-time check. The evaluator rejects any ontology that derives accepted facts directly from requires_relay-marked LLM observations, catching the violation before the app starts rather than at runtime.
Declaring the llm.complete capability
Section titled “Declaring the llm.complete capability”LLM calls are declared effects, just like HTTP calls. You bind an intent to llm.complete, declare the result observation kind, and specify the model resource:
[capabilities]models = ["extraction_model"]
[capabilities.intents]"intent.request_extraction" = { capability = "llm.complete", resource = "extraction_model", result_kind = "llm.extraction_result" }
[resources.model.extraction_model]provider = "openai"model = "gpt-4o-mini"credential_ref = "OPENAI_API_KEY"schema = "schemas/intake-extraction.json"replay = "record"Key points:
providernames the model backend. V1 supportsopenaiandanthropic.modelnames the concrete provider model to call.credential_refnames an environment variable. The actual API key never appears in config files or observation logs.result_kindnames the observation kind the runtime appends on successful structured output.schemapoints to a JSON Schema file that the shell uses as the structured-output contract.replay = "record"records the full request/response envelope on the effect attempt. Switching toreplayrequires a matching capture and refuses live provider calls.
The intent that triggers the LLM call is derived like any other:
rule intent.request_extraction(id, raw) :- intake_submitted(id, _, _, raw), not candidate.conditions(id, _, _), not intake_finalized(id).The guard not candidate.conditions(id, _, _) ensures the extraction fires only once per intake. If the LLM result arrives and candidates are asserted, the intent stops re-deriving.
World-slice construction
Section titled “World-slice construction”When the shell executes an llm.complete intent, it constructs a world slice — a focused subset of current facts relevant to the prompt. The world slice provides the LLM with context without exposing the entire fact database.
The world slice is assembled from:
- The intent arguments — these identify what the LLM should process (e.g., the raw intake text).
- Related facts — the shell follows declared relations from the intent arguments to gather context.
- The prompt bundle — the system prompt from
prompts/and the output schema fromschemas/.
For the medical-intake example, the world slice for intent.request_extraction("intake-1", raw_text) includes:
- The raw intake text from the intent argument
- The system prompt from
prompts/extraction-system.md - The output schema constraint from
schemas/intake-extraction.json
The world slice is deterministic for a given set of facts. This means the same facts always produce the same LLM request, making the prompt reproducible and auditable.
Prompt packages and prompt_bundle_digest
Section titled “Prompt packages and prompt_bundle_digest”Prompts live as markdown files in the prompts/ directory. The shell hashes each prompt file and the output schema together into a prompt_bundle_digest. This digest is recorded on every LLM effect observation.
prompts/ extraction-system.md # system promptschemas/ intake-extraction.json # structured output schemaprompts/extraction-system.md:
You are a medical intake extraction assistant. Given a patient's intake formtext, extract all mentioned medical conditions and current medications.
Return your response as structured JSON matching the `intake-extraction.json`schema. Include a confidence score between 0.0 and 1.0 reflecting how clearlythe intake text states each item.
Rules:- Extract only conditions and medications explicitly mentioned in the text.- Do not infer conditions from medications or vice versa.- If the text is ambiguous, set confidence below 0.7.- Normalize condition and medication names to standard clinical terminology where possible.The prompt_bundle_digest serves two purposes:
- Change detection. If you edit the system prompt or output schema, the digest changes. This lets you track which prompt version produced which LLM results.
- Evaluator identity. Prompt-only changes do not change the
evaluator_digest(which covers ontology rules and mapper semantics). This distinction matters: a prompt tweak affects LLM behavior but not the derivation logic. You can iterate on prompts without invalidating your ontology verification.
Model identity in provenance
Section titled “Model identity in provenance”Model output is actor-bearing evidence. When an LLM produces an observation, JacqOS records model and prompt identity separately from the evaluator:
model_refis the app resource that requested the model.provider_refandprovider_modelidentify the provider path used.prompt_bundle_digestidentifies the prompt and schema bundle for that turn.world_slice_digestidentifies the facts shown to the model.
In exported observation metadata, a model-produced event can also carry
actor_kind = "model" plus an actor_id such as model:extraction_model.
Your ontology may choose to reason about those fields, but the relay boundary
still applies: model output remains candidate.* or proposal.* until domain
rules accept it.
Structured-output schemas
Section titled “Structured-output schemas”Every llm.complete resource declares a JSON Schema that constrains the model’s output format. When the provider supports native structured-output constraints, the shell forwards the schema to the provider. Regardless of provider support, JacqOS always validates the parsed payload locally before it becomes an observation.
schemas/intake-extraction.json:
{ "type": "object", "additionalProperties": false, "required": ["intake_id", "extracted_conditions", "extracted_medications", "confidence"], "properties": { "intake_id": { "type": "string" }, "extracted_conditions": { "type": "array", "items": { "type": "string" } }, "extracted_medications": { "type": "array", "items": { "type": "string" } }, "confidence": { "type": "string", "description": "Confidence score as a decimal string between 0.0 and 1.0" } }}The schema is the structural contract between the LLM response and the mapper. The mapper expects fields at specific paths — the schema ensures those paths exist. If the model returns valid JSON that doesn’t match the schema, the shell records a schema-validation-failed observation rather than passing malformed data to the mapper.
Handling refusals and malformed output
Section titled “Handling refusals and malformed output”LLMs can refuse requests or return output that doesn’t match the schema. The shell handles these cases by recording distinct observation kinds:
| Outcome | Observation kind | What happens next |
|---|---|---|
| Valid structured response | llm.extraction_result | Mapper extracts atoms normally |
| Schema validation failure | llm.schema_validation_failed | Mapper produces error atoms; evaluator can derive retry intent |
| Model refusal | llm.refusal | Mapper produces refusal atoms; evaluator can derive fallback logic |
| Network/provider error | llm.error | Standard effect error; retry or reconciliation per capability rules |
You handle these in your mapper and ontology like any other observation:
mappings/inbound.rhai — the mapper declares the relay namespace and handles the success case. Note the two-argument atom(predicate, value) form: the current observation reference is injected automatically by the runtime.
fn mapper_contract() { #{ requires_relay: [ #{ observation_class: "llm.extraction_result", predicate_prefixes: ["extraction.condition", "extraction.medication"], relay_namespace: "candidate", } ], }}
fn map_observation(obs) { let body = parse_json(obs.payload);
if obs.kind == "llm.extraction_result" { let atoms = [ atom("extraction.intake_id", body.intake_id), atom("extraction.confidence", body.confidence), atom("extraction.seq", body.seq), ];
for condition in body.extracted_conditions { atoms.push(atom("extraction.condition", condition)); }
for medication in body.extracted_medications { atoms.push(atom("extraction.medication", medication)); }
return atoms; }
[]}For schema validation failures or refusals, you can derive retry intents or escalation logic in your ontology:
relation extraction_failed(intake_id: text, reason: text)
rule extraction_failed(id, reason) :- atom(obs, "extraction_error.intake_id", id), atom(obs, "extraction_error.reason", reason).
-- Re-derive extraction intent if first attempt failed and we haven't-- exceeded retry limitrule intent.request_extraction(id, raw) :- intake_submitted(id, _, _, raw), extraction_failed(id, _), not candidate.conditions(id, _, _), not intake_finalized(id).The key insight: failure handling is declarative. You don’t write try/catch blocks. You write rules that derive facts from failure observations, and those facts trigger the appropriate next action.
Offline replay of LLM interactions
Section titled “Offline replay of LLM interactions”Every LLM call is recorded with its full request/response envelope. During replay-only execution, the shell uses matching captures instead of making live API calls.
An LLM capture records the replay identity and the outcome:
{ "request": { "model_ref": "extraction_model", "provider_ref": "openai", "provider_model": "gpt-4o-mini", "prompt_bundle_digest": "sha256:...", "world_slice_digest": "sha256:...", "result_observation_kind": "llm.extraction_result", "structured_output_schema_ref": "schemas/intake-extraction.json" }, "response": { "validation": "valid", "refusal": "not_refused", "token_usage": { "prompt_tokens": 187, "completion_tokens": 62, "total_tokens": 249 }, "provenance": "live" }, "outcome_observation": { "kind": "llm.extraction_result", "source": "effect:llm.complete" }}Captures record the important replay evidence:
- The request identity — model resource, provider, provider model, prompt digest, world-slice digest, schema, and result kind
- The terminal outcome — validation state, refusal state, parsed response, provider error, and token usage
- The outcome observation — the exact observation that re-enters the mapper and ontology
This means:
jacqos replayproduces identical results without API keys or network accessjacqos verifyconfirms that fixtures produce expected facts using recorded captures- You can share verification evidence across your team without sharing API credentials
- Token usage is visible for cost tracking and optimization
Child-lineage forking for fresh live reruns
Section titled “Child-lineage forking for fresh live reruns”Recordings make replay deterministic, but sometimes you need a fresh live rerun — for example, after changing a prompt or switching models. JacqOS uses child-lineage forking for this.
Stability:
pinned public workflowAuthority: The branching model lives in spec/jacqos/v1/lineage.md. The checked-in public command inventory lives in tools/jacqos-cli/protocols/README.md. The frozen V1 surface is
jacqos lineage fork,jacqos replay --lineage <LINEAGE> ..., andjacqos studio --lineage <LINEAGE>.
A child lineage branches from the committed head of the current lineage. It inherits all observations up to the fork point, then diverges independently:
# Fork from the currently selected lineage headjacqos lineage forkIn the child lineage:
- Observations before the fork point are inherited (not re-executed)
- LLM intents after the fork point execute live against the real model
- New recordings are captured in the child lineage
- The parent lineage is untouched
This lets you A/B test prompt changes safely:
- Fork a child lineage from your production observation history
- Update
prompts/extraction-system.mdwith your new prompt - Continue the child lineage with
jacqos replay --lineage <CHILD_LINEAGE_ID> ...or your live shell workflow so the LLM sees the same inputs with the new prompt - Compare the child’s derived facts against the parent’s using
jacqos studio --lineage <CHILD_LINEAGE_ID> - If the new prompt performs better, promote the child lineage
Child lineages never merge back. If you want to adopt the child’s behavior, you promote it as the new primary lineage. This preserves the complete audit trail of both the original and experimental runs.
Worked example: LLM disagreement
Section titled “Worked example: LLM disagreement”The medical-intake example includes an llm-disagreement-path fixture that exercises what happens when the LLM gets it wrong:
fixtures/llm-disagreement-path.jsonl:
{"kind":"intake.submitted","payload":{"intake_id":"intake-3","patient_name":"Maria Garcia","dob":"1990-11-04","raw_text":"Patient mentions occasional headaches and something about blood pressure pills. Hard to read handwriting."}}{"kind":"llm.extraction_result","payload":{"intake_id":"intake-3","extracted_conditions":["chronic headaches","hypertension"],"extracted_medications":["amlodipine 5mg"],"confidence":"0.45","seq":"1"}}{"kind":"clinician.review","payload":{"intake_id":"intake-3","approved":"false","corrections":"Patient has tension headaches only, not chronic. No confirmed hypertension diagnosis. Medication is actually acetaminophen PRN, not amlodipine."}}Walk through what happens:
- An intake arrives with ambiguous handwriting
- The LLM extracts conditions and medications — but with low confidence (0.45)
- The candidates land as
candidate.conditionsandcandidate.medications intent.notify_clinicianfires because candidates exist but no approval yet- The clinician reviews and rejects — the LLM got it wrong
clinician_rejectedis asserted with corrections- The
accepted_conditionsrule never fires becauseclinician_approvedis absent - No LLM-derived data becomes accepted fact
The candidate-evidence pattern prevented incorrect LLM output from ever becoming trusted fact. The low confidence score is visible in provenance, and the clinician’s corrections are recorded as observations for audit.
Best practices
Section titled “Best practices”- Always use the candidate pattern. Never derive accepted facts directly from LLM observations. The evaluator rejects this at load time, but designing with it from the start produces cleaner ontologies.
- Set confidence thresholds. Use the extraction confidence in your rules to gate behavior. Low-confidence extractions might skip automated processing and go straight to human review.
- Keep prompts in version control. Prompt files in
prompts/are hashed into theprompt_bundle_digest. Treat them like code — review changes, track versions. - Ship disagreement fixtures. Every LLM-assisted app should include fixtures that exercise the rejection path. If your candidate-evidence gate never fires in tests, you haven’t tested the most important path.
- Use structured output schemas. They eliminate an entire class of parsing errors and make the mapper contract explicit. If the model can’t conform to the schema, you get a clean error observation instead of a silent parsing failure.
- Record everything. Keep
replay = "record"on during development. Recordings are your test fixtures, your debugging aids, and your cost audit trail.
Next steps
Section titled “Next steps”- LLM Decision Containment — the product framing for routing model-proposed actions through a domain decision rule
- Action Proposals — authoring
proposal.*relays and the ratification rules that gate them - Using Fallible Sensors Safely — the broader product pattern behind candidate-evidence
- Effects and Intents — the full intent lifecycle that drives LLM calls
- Fixtures and Invariants — verify LLM behavior with deterministic replay
- Debugging with Provenance — trace LLM-derived facts back to their source observations
- jacqos.toml Reference — configuring model resources and capabilities
- Rhai Mapper API — the host functions available to mappers (
atom,parse_json,mapper_contract)