Air Canada Refund Policy
In 2022, an Air Canada customer asked the airline’s chatbot about bereavement refunds after his grandmother died. The chatbot confidently invented a refund policy that the airline did not actually have — bereavement refunds claimable up to 90 days after travel — and the customer flew on the cheaper next-day fare expecting the refund to come through. When Air Canada refused, the customer took the airline to small claims court. The tribunal ruled that the airline was responsible for what its chatbot had said. Air Canada paid. The case became the textbook example of why letting an LLM directly answer policy questions is not a containment strategy — it is a liability surface.
This example is the cleanest demonstration of how to build the same use case in a way that makes the failure structurally impossible. The model is still allowed to propose a refund disposition. The system is not allowed to act on a proposal whose policy basis does not exist.
The Real-World Failure
Section titled “The Real-World Failure”The Air Canada chatbot was deployed as a customer service assistant. When the bereavement question came in, the model did what models do: it produced confident-sounding text about a policy. The text was fluent, the policy was fictional, and the airline’s downstream systems had no structural way to know the difference. Once the customer relied on the answer, the airline was on the hook for it — not because the model is a legal agent, but because nothing between the model and the world refused the made-up policy.
The same shape recurs across every domain where LLMs sit in front
of decisions: a Chevrolet dealership chatbot tricked into “selling”
a Tahoe for $1; travel agents booking flights that do not exist;
support bots issuing service credits the company never authorised.
The model is doing exactly what a model does. The mistake is
treating its output as authority.
The Structural Guarantee
Section titled “The Structural Guarantee”JacqOS makes it impossible for a model’s answer to become a world-facing action without first passing through a policy check you wrote and can inspect.
The model’s structured output — “send a refund of $1,200 for
bereavement_pre_travel” — lands in the reserved proposal.*
namespace. The mapper declares this routing explicitly and the
loader’s validate_relay_boundaries check refuses any program
that tries to feed intent.send_refund directly from the model’s
atoms. From proposal.*, an ontology decision rule evaluates the
proposal against the airline’s actual policy table, encoded as
policy.refund_authorized_for(reason_code, requires_documentation)
facts. Only authorized decisions can derive intent.send_refund.
The Air Canada failure mode is now the empty case: the model
proposes bereavement_after_travel, but no
policy.refund_authorized_for("bereavement_after_travel", _) fact
exists. The authorize rule cannot fire. The escalate rule cannot
fire. The block rule fires instead. The customer is told the
refund cannot be processed; the airline never sends money it never
owed.
Critically, the policy table is not part of the LLM’s prompt or
the rule code. It is a set of observations. A refund reason only
becomes authoritative when its airline.refund_policy_snapshot
version matches an airline.refund_policy_current observation.
To add a new authorized refund reason you record new policy
observations and advance the current policy version. Both
observations show up in Studio with full provenance. The audit
trail for “is this a real policy?” is “show me the current policy
observation that introduced it.”
What You Will See In Studio
Section titled “What You Will See In Studio”Run the bundled fixtures in Studio. The scenarios cover the
Done, Blocked, and Waiting Activity outcomes.
happy-path-eligible-refund-> theDonetab shows a row likerefund-req-1: $1,200 refund authorized, bereavement_pre_travel. Drill in and the inspector takes you from the executed refund back throughrefund.decision.authorized, therefund.proposal_policy_matchhelper, the death-certificate documentation observation, the policy snapshot, and the model’s refund-decision observation. Every link is a real observation; the rule code is incidental.contradiction-fabricated-policy-> theBlockedtab showsrefund-req-2: refund declined -- fabricated_policy. Drill in and the inspector names the missingpolicy.refund_authorized_for("bereavement_after_travel", _)fact, theproposal.refund_actionrow that triggered the block, and the model’s decision observation. There is no rule to read to understand the refusal — the absence of a policy fact is the refusal.stale-policy-path-> theBlockedtab showsrefund-req-6: refund declined -- stale_policy. Drill in and the inspector shows thatbereavement_after_travelexists only in the oldtariff-2021-11snapshot while the current policy istariff-2024-02. The model may have stale retrieved evidence; stale evidence is not payout authority.boundary-undocumented-bereavement-> theWaitingtab showsrefund-req-3: review opened -- documentation_pending. Drill in and the inspector names therefund.decision.requires_agent_reviewdecision and the missingrefund.documentation_attachedevidence. An agent picks it up, collects the death certificate, and either approves or declines. The model never sees the decision.
A fifth fixture, bypass-attempt-path, simulates a manual
operator bypass: someone records airline.refund_sent
observations directly without an authorising decision. The named
invariants — refund_only_under_authorized_policy,
refund_within_policy_max,
refund_sent_requires_authorized_decision, and
review_opened_requires_review_decision — all fire. This is the
defence-in-depth wall. Even if a future rule edit accidentally
relaxed the decision layer, an unauthorised refund would still
fail invariant review and jacqos verify would surface it.
What It Looks Like In Code
Section titled “What It Looks Like In Code”The mapper declares that the model’s structured refund output
routes through the proposal.* relay namespace. The loader
refuses any rule that derives intent.send_refund directly from
these atoms.
fn mapper_contract() { #{ requires_relay: [ #{ observation_class: "llm.refund_decision_result", predicate_prefixes: [ "refund_decision.action", "refund_decision.reason_code", "refund_decision.amount_usd", ], relay_namespace: "proposal", } ], }}A proposal staging rule lifts the model’s atoms into
proposal.refund_action. Notice the model’s reason_code is
treated as untrusted text — the rule does not check it against
policy here.
rule assert proposal.refund_action(request_id, action, reason_code, decision_seq) :- atom(obs, "refund_decision.request_id", request_id), atom(obs, "refund_decision.action", action), atom(obs, "refund_decision.reason_code", reason_code), atom(obs, "refund_decision.seq", decision_seq).A small bridge rule derives whether the proposed reason code actually corresponds to a real authorized policy. This is the cryptographic-proof version of the Air Canada wall: the bridge predicate cannot exist for a policy that has not been recorded.
rule refund.proposal_policy_match(request_id, reason_code, required_doc) :- proposal.refund_action(request_id, _, reason_code, _), policy.refund_authorized_for(reason_code, required_doc).The decision rules then evaluate the proposal against policy. Only the authorized decision is wired to an executable intent.
rule refund.decision.authorized(request_id, reason_code, amount_usd) :- refund.current_decision_seq(request_id, decision_seq), proposal.refund_action(request_id, "send_refund", reason_code, decision_seq), proposal.refund_amount(request_id, amount_usd, decision_seq), refund.documentation_satisfied(request_id, reason_code, _), policy.refund_max_usd(reason_code, max_usd), amount_usd <= max_usd.
rule refund.decision.blocked(request_id, "fabricated_policy") :- refund.current_decision_seq(request_id, decision_seq), proposal.refund_action(request_id, "send_refund", reason_code, decision_seq), not policy.refund_authorized_for(reason_code, _), not policy.refund_known_stale(reason_code).
rule refund.decision.blocked(request_id, "stale_policy") :- refund.current_decision_seq(request_id, decision_seq), proposal.refund_action(request_id, "send_refund", reason_code, decision_seq), policy.refund_known_stale(reason_code), not policy.refund_authorized_for(reason_code, _).Only authorized decisions become executable intents:
rule intent.send_refund(request_id, reason_code, amount_usd) :- refund.decision.authorized(request_id, reason_code, amount_usd), not refund.sent(request_id, _), not refund.review_opened(request_id, _).And the named invariants close the loop. They do not depend on any decision rule firing — they quantify directly over the sent-refund relation against the policy table:
invariant refund_only_under_authorized_policy() :- count refund.invariant.refund_sent_without_authorized_policy() <= 0.
invariant refund_within_policy_max() :- count refund.invariant.refund_sent_above_policy_max() <= 0.If anything ever causes an airline.refund_sent observation under
a policy that does not exist, or above the policy’s per-policy
maximum, the invariant fails — regardless of how the decision
rules were configured. This is the difference between a policy
checked once at code-review time and a policy continuously checked
by the engine after every fixed point: the latter survives
refactoring, rule edits, and accidental relaxations.
Why The Air Canada Failure Cannot Recur Here
Section titled “Why The Air Canada Failure Cannot Recur Here”In the original failure, the model’s text output was the source of truth for the policy. There was no place in the pipeline where the question “is this policy real?” could be asked, because the policy was not data anywhere — it was an emergent property of model behaviour.
In this example, the policy is policy.refund_authorized_for
facts, derived from airline.refund_policy_snapshot observations
that match the current airline.refund_policy_current version. A
new policy can only be added by the operator recording the policy
snapshot and advancing the current version, both of which appear in
Studio’s timeline with full provenance. The model can describe a
policy in beautiful natural language all it likes; if the
corresponding current fact is not present, the authorize rule will
not fire and intent.send_refund will not derive.
The model is free. The safety is structural. The same shape — proposal namespace, decision rule keyed on operator-recorded fact tables, named invariant as a structural backstop — is the answer to every “the chatbot promised something we can’t deliver” failure mode.
Project Structure
Section titled “Project Structure”jacqos-air-canada-refund-policy/ jacqos.toml ontology/ schema.dh # relation declarations rules.dh # proposal staging, decision rules, invariants intents.dh # refund and review intent derivation mappings/ inbound.rhai # mapper contract + observation mapping prompts/ refund-decision-system.md # prompt bundle for package export schemas/ refund-decision.json # structured-output schema fixtures/ happy-path-eligible-refund.jsonl happy-path-eligible-refund.expected.json contradiction-fabricated-policy.jsonl contradiction-fabricated-policy.expected.json stale-policy-path.jsonl stale-policy-path.expected.json boundary-undocumented-bereavement.jsonl boundary-undocumented-bereavement.expected.json bypass-attempt-path.jsonl bypass-attempt-path.expected.json generated/ ... # verification, graph, and export artifactsThe five fixtures together cover the complete safety surface: the
authorize branch (happy path), the block branch under a fabricated
policy (the original Air Canada failure), the stale-policy branch
where old tariff text is visible but not authoritative, the
escalate branch under missing documentation, and the
structural-bypass branch where a broken external system writes a
airline.refund_sent observation that no decision authorised.
Make It Yours
Section titled “Make It Yours”The Air Canada example is one shape of LLM decision containment applied to policy-bounded customer commitments. The same proposal-then-decision-then-invariant pattern fits any domain where a model could hallucinate authority into a costly action:
- Insurance claims — a model proposes a coverage decision; a
policy decision rule authorises only when the line item exists
in
policy.coverage_authorized_for(...)facts derived from the carrier’s own policy snapshots. - Loyalty-program adjustments — a model proposes a tier upgrade or points credit; a balance decision rule authorises only against the live program-rules table the operations team maintains as observations.
- Healthcare prior-authorisation — a model proposes a procedure approval; a payer decision rule authorises only against contracted-network and benefit facts pulled from the payer’s authoritative source.
- Government benefits chatbots — a model proposes an eligibility answer; a regulation decision rule authorises only when the cited statute, version, and effective date exist as recorded observations.
Any time the cost of believing a hallucinated policy is greater than the cost of saying “I cannot confirm that,” the structural boundary fits. To start building, scaffold a starter app:
jacqos scaffold --pattern decision my-policy-decision-appGoing Deeper
Section titled “Going Deeper”This example sits inside a broader pattern. To understand the shape:
- LLM Decision Containment — the pattern essay this example instantiates.
- Chevy Offer Containment — the same pattern applied to dealership pricing.
- Action Proposals — how to author decider-relay proposals and the ratification rules that gate them.
- Action Proposals (Technical)
— schema reference and the mechanics of
proposal.*validation. - Golden Fixtures — how the four fixtures above provide digest-backed evidence that the blocked / escalated / authorized / bypass-caught outcomes hold for their input timelines.
- Invariant Review — the named invariants that catch any rule-edit regression.