Skip to content

Air Canada Refund Policy

In 2022, an Air Canada customer asked the airline’s chatbot about bereavement refunds after his grandmother died. The chatbot confidently invented a refund policy that the airline did not actually have — bereavement refunds claimable up to 90 days after travel — and the customer flew on the cheaper next-day fare expecting the refund to come through. When Air Canada refused, the customer took the airline to small claims court. The tribunal ruled that the airline was responsible for what its chatbot had said. Air Canada paid. The case became the textbook example of why letting an LLM directly answer policy questions is not a containment strategy — it is a liability surface.

This example is the cleanest demonstration of how to build the same use case in a way that makes the failure structurally impossible. The model is still allowed to propose a refund disposition. The system is not allowed to act on a proposal whose policy basis does not exist.

The Air Canada chatbot was deployed as a customer service assistant. When the bereavement question came in, the model did what models do: it produced confident-sounding text about a policy. The text was fluent, the policy was fictional, and the airline’s downstream systems had no structural way to know the difference. Once the customer relied on the answer, the airline was on the hook for it — not because the model is a legal agent, but because nothing between the model and the world refused the made-up policy.

The same shape recurs across every domain where LLMs sit in front of decisions: a Chevrolet dealership chatbot tricked into “selling” a Tahoe for $1; travel agents booking flights that do not exist; support bots issuing service credits the company never authorised. The model is doing exactly what a model does. The mistake is treating its output as authority.

JacqOS makes it impossible for a model’s answer to become a world-facing action without first passing through a policy check you wrote and can inspect.

The model’s structured output — “send a refund of $1,200 for bereavement_pre_travel” — lands in the reserved proposal.* namespace. The mapper declares this routing explicitly and the loader’s validate_relay_boundaries check refuses any program that tries to feed intent.send_refund directly from the model’s atoms. From proposal.*, an ontology decision rule evaluates the proposal against the airline’s actual policy table, encoded as policy.refund_authorized_for(reason_code, requires_documentation) facts. Only authorized decisions can derive intent.send_refund.

The Air Canada failure mode is now the empty case: the model proposes bereavement_after_travel, but no policy.refund_authorized_for("bereavement_after_travel", _) fact exists. The authorize rule cannot fire. The escalate rule cannot fire. The block rule fires instead. The customer is told the refund cannot be processed; the airline never sends money it never owed.

Critically, the policy table is not part of the LLM’s prompt or the rule code. It is a set of observations. A refund reason only becomes authoritative when its airline.refund_policy_snapshot version matches an airline.refund_policy_current observation. To add a new authorized refund reason you record new policy observations and advance the current policy version. Both observations show up in Studio with full provenance. The audit trail for “is this a real policy?” is “show me the current policy observation that introduced it.”

Run the bundled fixtures in Studio. The scenarios cover the Done, Blocked, and Waiting Activity outcomes.

  • happy-path-eligible-refund -> the Done tab shows a row like refund-req-1: $1,200 refund authorized, bereavement_pre_travel. Drill in and the inspector takes you from the executed refund back through refund.decision.authorized, the refund.proposal_policy_match helper, the death-certificate documentation observation, the policy snapshot, and the model’s refund-decision observation. Every link is a real observation; the rule code is incidental.
  • contradiction-fabricated-policy -> the Blocked tab shows refund-req-2: refund declined -- fabricated_policy. Drill in and the inspector names the missing policy.refund_authorized_for("bereavement_after_travel", _) fact, the proposal.refund_action row that triggered the block, and the model’s decision observation. There is no rule to read to understand the refusal — the absence of a policy fact is the refusal.
  • stale-policy-path -> the Blocked tab shows refund-req-6: refund declined -- stale_policy. Drill in and the inspector shows that bereavement_after_travel exists only in the old tariff-2021-11 snapshot while the current policy is tariff-2024-02. The model may have stale retrieved evidence; stale evidence is not payout authority.
  • boundary-undocumented-bereavement -> the Waiting tab shows refund-req-3: review opened -- documentation_pending. Drill in and the inspector names the refund.decision.requires_agent_review decision and the missing refund.documentation_attached evidence. An agent picks it up, collects the death certificate, and either approves or declines. The model never sees the decision.

A fifth fixture, bypass-attempt-path, simulates a manual operator bypass: someone records airline.refund_sent observations directly without an authorising decision. The named invariants — refund_only_under_authorized_policy, refund_within_policy_max, refund_sent_requires_authorized_decision, and review_opened_requires_review_decision — all fire. This is the defence-in-depth wall. Even if a future rule edit accidentally relaxed the decision layer, an unauthorised refund would still fail invariant review and jacqos verify would surface it.

The mapper declares that the model’s structured refund output routes through the proposal.* relay namespace. The loader refuses any rule that derives intent.send_refund directly from these atoms.

mappings/inbound.rhai
fn mapper_contract() {
#{
requires_relay: [
#{
observation_class: "llm.refund_decision_result",
predicate_prefixes: [
"refund_decision.action",
"refund_decision.reason_code",
"refund_decision.amount_usd",
],
relay_namespace: "proposal",
}
],
}
}

A proposal staging rule lifts the model’s atoms into proposal.refund_action. Notice the model’s reason_code is treated as untrusted text — the rule does not check it against policy here.

rule assert proposal.refund_action(request_id, action, reason_code, decision_seq) :-
atom(obs, "refund_decision.request_id", request_id),
atom(obs, "refund_decision.action", action),
atom(obs, "refund_decision.reason_code", reason_code),
atom(obs, "refund_decision.seq", decision_seq).

A small bridge rule derives whether the proposed reason code actually corresponds to a real authorized policy. This is the cryptographic-proof version of the Air Canada wall: the bridge predicate cannot exist for a policy that has not been recorded.

rule refund.proposal_policy_match(request_id, reason_code, required_doc) :-
proposal.refund_action(request_id, _, reason_code, _),
policy.refund_authorized_for(reason_code, required_doc).

The decision rules then evaluate the proposal against policy. Only the authorized decision is wired to an executable intent.

rule refund.decision.authorized(request_id, reason_code, amount_usd) :-
refund.current_decision_seq(request_id, decision_seq),
proposal.refund_action(request_id, "send_refund", reason_code, decision_seq),
proposal.refund_amount(request_id, amount_usd, decision_seq),
refund.documentation_satisfied(request_id, reason_code, _),
policy.refund_max_usd(reason_code, max_usd),
amount_usd <= max_usd.
rule refund.decision.blocked(request_id, "fabricated_policy") :-
refund.current_decision_seq(request_id, decision_seq),
proposal.refund_action(request_id, "send_refund", reason_code, decision_seq),
not policy.refund_authorized_for(reason_code, _),
not policy.refund_known_stale(reason_code).
rule refund.decision.blocked(request_id, "stale_policy") :-
refund.current_decision_seq(request_id, decision_seq),
proposal.refund_action(request_id, "send_refund", reason_code, decision_seq),
policy.refund_known_stale(reason_code),
not policy.refund_authorized_for(reason_code, _).

Only authorized decisions become executable intents:

rule intent.send_refund(request_id, reason_code, amount_usd) :-
refund.decision.authorized(request_id, reason_code, amount_usd),
not refund.sent(request_id, _),
not refund.review_opened(request_id, _).

And the named invariants close the loop. They do not depend on any decision rule firing — they quantify directly over the sent-refund relation against the policy table:

invariant refund_only_under_authorized_policy() :-
count refund.invariant.refund_sent_without_authorized_policy() <= 0.
invariant refund_within_policy_max() :-
count refund.invariant.refund_sent_above_policy_max() <= 0.

If anything ever causes an airline.refund_sent observation under a policy that does not exist, or above the policy’s per-policy maximum, the invariant fails — regardless of how the decision rules were configured. This is the difference between a policy checked once at code-review time and a policy continuously checked by the engine after every fixed point: the latter survives refactoring, rule edits, and accidental relaxations.

Why The Air Canada Failure Cannot Recur Here

Section titled “Why The Air Canada Failure Cannot Recur Here”

In the original failure, the model’s text output was the source of truth for the policy. There was no place in the pipeline where the question “is this policy real?” could be asked, because the policy was not data anywhere — it was an emergent property of model behaviour.

In this example, the policy is policy.refund_authorized_for facts, derived from airline.refund_policy_snapshot observations that match the current airline.refund_policy_current version. A new policy can only be added by the operator recording the policy snapshot and advancing the current version, both of which appear in Studio’s timeline with full provenance. The model can describe a policy in beautiful natural language all it likes; if the corresponding current fact is not present, the authorize rule will not fire and intent.send_refund will not derive.

The model is free. The safety is structural. The same shape — proposal namespace, decision rule keyed on operator-recorded fact tables, named invariant as a structural backstop — is the answer to every “the chatbot promised something we can’t deliver” failure mode.

jacqos-air-canada-refund-policy/
jacqos.toml
ontology/
schema.dh # relation declarations
rules.dh # proposal staging, decision rules, invariants
intents.dh # refund and review intent derivation
mappings/
inbound.rhai # mapper contract + observation mapping
prompts/
refund-decision-system.md # prompt bundle for package export
schemas/
refund-decision.json # structured-output schema
fixtures/
happy-path-eligible-refund.jsonl
happy-path-eligible-refund.expected.json
contradiction-fabricated-policy.jsonl
contradiction-fabricated-policy.expected.json
stale-policy-path.jsonl
stale-policy-path.expected.json
boundary-undocumented-bereavement.jsonl
boundary-undocumented-bereavement.expected.json
bypass-attempt-path.jsonl
bypass-attempt-path.expected.json
generated/
... # verification, graph, and export artifacts

The five fixtures together cover the complete safety surface: the authorize branch (happy path), the block branch under a fabricated policy (the original Air Canada failure), the stale-policy branch where old tariff text is visible but not authoritative, the escalate branch under missing documentation, and the structural-bypass branch where a broken external system writes a airline.refund_sent observation that no decision authorised.

The Air Canada example is one shape of LLM decision containment applied to policy-bounded customer commitments. The same proposal-then-decision-then-invariant pattern fits any domain where a model could hallucinate authority into a costly action:

  • Insurance claims — a model proposes a coverage decision; a policy decision rule authorises only when the line item exists in policy.coverage_authorized_for(...) facts derived from the carrier’s own policy snapshots.
  • Loyalty-program adjustments — a model proposes a tier upgrade or points credit; a balance decision rule authorises only against the live program-rules table the operations team maintains as observations.
  • Healthcare prior-authorisation — a model proposes a procedure approval; a payer decision rule authorises only against contracted-network and benefit facts pulled from the payer’s authoritative source.
  • Government benefits chatbots — a model proposes an eligibility answer; a regulation decision rule authorises only when the cited statute, version, and effective date exist as recorded observations.

Any time the cost of believing a hallucinated policy is greater than the cost of saying “I cannot confirm that,” the structural boundary fits. To start building, scaffold a starter app:

Terminal window
jacqos scaffold --pattern decision my-policy-decision-app

This example sits inside a broader pattern. To understand the shape: