Provenance, mandate, reversibility, audit — not as policy slides, but as code that ships.
I'm an engineer who builds the control layer for autonomous systems: temporal-provenance memory, governed decision-to-action chains, mandate gates, and LLM-as-judge auditing. Designed under one doctrine — no claim without a citation, no action without a mandate — and proven on live, paying products under the Auroch name.
Every system below is an application of the same four principles. This is what an AI governance practice looks like when it's built into the substrate instead of bolted on afterward.
Memory is temporal: every claim carries its source, its as-of date, and what superseded it. Truth has a clock.
"No claim without a citation."
An agent reads its envelope before it acts. Advice is not execution; a draft is not a send; irreversible needs confirmation.
"No action without a mandate."
Committed actions are continuously revalidated. When the evidence underneath them dies, a governed reversal fires — not a panic.
"Evidence superseded → governed reversal."
Hash-chained ledgers, replayable decisions, and offline-verifiable receipts. The model is never the final authority — the record is.
"No grade without a citation."
Six interlocking systems that decide what was true, what may be done, and whether it may remain done — each with its own tests, CLI, and audit trail.
A shared spine for temporal provenance memory — append-only facts with citation, supersession, and validity (live / stale / superseded). Deployed across five verticals (Markets, Counsel, Clinical, Property, Arbiter). Ships the Receipt primitive: a portable, offline-verifiable Ed25519 proof of a claim, minted with a secret key and verified for free with a public one. The asymmetry is the product.
A control loop that prevents AI from making irreversible, unaudited, overconfident, or stale-reasoned high-stakes decisions. Reversibility state machine + hash-chained ledger + policy-as-code mandate + live Veritas binding + Arbiter quorum. Its Watchtower continuously revalidates committed actions and, on dead basis, triggers a governed reversal that routes through a real broker adapter to close positions — the full physical loop, closed.
An LLM-as-judge that refuses to grade without citing the evidence for its grade — the provenance-containment thesis applied to evaluations. Built deliberately as an Anthropic-recruiting flagship targeting the Evals / faithfulness mission. Plus a separate Veritas Arbiter governance face that adjudicates AI decisions against time-of-decision evidence across six verdicts (defensible / stale-basis / missing-basis / contradicted / out-of-scope / requires-human-review) with full replay.
A live gate between intent and execution: Intent → read mandate → envelope → Execute / Ask / Narrow / Escalate / Refuse → hash-chained receipt. Four enforced rules — draft ≠ send, advice ≠ execution, irreversible → confirm, domain-trust isolation. Doctrine: every optimistic predicate defaults pessimistic until a named pipeline measures it. Stakes no longer acts alone.
A clinician-guided emotional-crisis companion built as a bridge, not an endpoint — it lowers the activation energy of reaching a human. Safety core: deterministic risk detector + response governor + escalation router (verified 9-8-8 / 9-1-1) + hash-chained session ledger + a clinician-protocol honesty gate that keeps everything in DRAFT until a named clinician signs off. The model is never the safety authority.
A "chart-reality amplifier" for physicians — the physician is the pilot, the system is non-autonomous. Multi-format clinical ingest (FHIR R4 · HL7 v2 · C-CDA) + safety gate + RxNorm normalization + a licensed-provider contract that fails loud to human review rather than guessing. A provenance/temporal layer on top of a licensed authority — never a replacement for one.
Governance is only credible if you can build the hard systems it governs. I do — from on-device ML to causal-graph market intelligence.
Auroch Local Model: a full MLX local-inference system (4-bit Qwen3 + retrieval router + eval harness), converted, QLoRA-tuned, and profiled on an M1 Pro under an 8 GB cap. Ported fully on-device to iOS as Prana — zero cloud, zero key, pure-Swift kernel.
Auroch Quantum: role-aware mixed-precision compression for MLX — per-layer sensitivity, KV-quant, MoE residency/paging, and speculative decoding, proven live on OLMoE. An inspect → plan → build → eval → serve survivor loop, 79 tests.
ORBIS War Room: a 26-node global causal graph whose edge weights are measured from ~2 years of return correlation — not asserted. Propagates a public shock across the graph, ranks exposed assets, and cites every link. Calibration once caught and fixed a bad proxy.
Doctrine that doesn't ship is philosophy. These run in production behind real Stripe checkouts.
33 goods, value/rent/macro lenses, live market feed, cross-border CUSMA modeling, and a gated causal War Room — deployed at orbis.aurochthryx.com behind a recurring subscription. 154+ tests.
A "paid Zillow" — one truth engine across three surfaces (homeowner / agent / contractor), with a cited CLEAR / REVIEW / HIGH-RISK verdict and a contractor-quote x-ray. Live at dwell.aurochthryx.com. 86 tests.
The stripeable wedge: tenant-isolated API keys → per-customer ledger → quota → Stripe-gated status, with a self-serve signup loop built on stdlib alone. Invariant: no ledger until a key resolves to an active tenant. Live on Render. 38 tests.
The whole body of work is catalogued in a queryable SQLite index that walks the live filesystem for disk, git, and test facts — plus a shared commerce spine (one signed Passport cookie carries entitlements across every product). Systems thinking, not a folder of demos.
Mandate gates, escalation routing, fail-loud defaults, and human-in-the-loop control designed into the system.
Temporal memory, hash-chained ledgers, replayable decisions, and offline-verifiable receipts.
LLM-as-judge that cites its evidence; calibration loops that catch their own flawed assumptions.
On-device inference, QLoRA, role-aware quantization, and speculative decoding on real hardware.
I want to work where the hard question is "how do we make this system accountable?" — and the answer has to compile. If that's your team, let's talk.