AI Governance  ◆  Safety Engineering

Building AI that
can be held to account

Provenance, mandate, reversibility, audit — not as policy slides, but as code that ships.

I'm an engineer who builds the control layer for autonomous systems: temporal-provenance memory, governed decision-to-action chains, mandate gates, and LLM-as-judge auditing. Designed under one doctrine — no claim without a citation, no action without a mandate — and proven on live, paying products under the Auroch name.

15+
Systems shipped
4
Live revenue products
600+
Passing tests
1
Governing doctrine
The Through-Line

A doctrine, not a demo

Every system below is an application of the same four principles. This is what an AI governance practice looks like when it's built into the substrate instead of bolted on afterward.

◆ Principle I

Provenance

Memory is temporal: every claim carries its source, its as-of date, and what superseded it. Truth has a clock.

"No claim without a citation."

◆ Principle II

Mandate

An agent reads its envelope before it acts. Advice is not execution; a draft is not a send; irreversible needs confirmation.

"No action without a mandate."

◆ Principle III

Reversibility

Committed actions are continuously revalidated. When the evidence underneath them dies, a governed reversal fires — not a panic.

"Evidence superseded → governed reversal."

◆ Principle IV

Audit

Hash-chained ledgers, replayable decisions, and offline-verifiable receipts. The model is never the final authority — the record is.

"No grade without a citation."

Flagship Work

The AI Governance Stack

Six interlocking systems that decide what was true, what may be done, and whether it may remain done — each with its own tests, CLI, and audit trail.

Veritas

Temporal Provenance Substrate
Live · Paid

A shared spine for temporal provenance memory — append-only facts with citation, supersession, and validity (live / stale / superseded). Deployed across five verticals (Markets, Counsel, Clinical, Property, Arbiter). Ships the Receipt primitive: a portable, offline-verifiable Ed25519 proof of a claim, minted with a secret key and verified for free with a public one. The asymmetry is the product.

Governance valueAnswers "was this source still governing when the answer was made?" — the core question of AI accountability.
PythonEd25519MCPStripe-live

Auroch Stakes

Governed Decision-to-Action
92/92 tests

A control loop that prevents AI from making irreversible, unaudited, overconfident, or stale-reasoned high-stakes decisions. Reversibility state machine + hash-chained ledger + policy-as-code mandate + live Veritas binding + Arbiter quorum. Its Watchtower continuously revalidates committed actions and, on dead basis, triggers a governed reversal that routes through a real broker adapter to close positions — the full physical loop, closed.

Governance valueTurns "the bot panicked" into "evidence superseded, so the system reversed itself, with receipts."
State machineHash-chainPolicy-as-codeOANDA adapter

Auroch Arbiter

Auditable LLM-as-Judge
Evals

An LLM-as-judge that refuses to grade without citing the evidence for its grade — the provenance-containment thesis applied to evaluations. Built deliberately as an Anthropic-recruiting flagship targeting the Evals / faithfulness mission. Plus a separate Veritas Arbiter governance face that adjudicates AI decisions against time-of-decision evidence across six verdicts (defensible / stale-basis / missing-basis / contradicted / out-of-scope / requires-human-review) with full replay.

Governance valueMakes model evaluation itself faithful and auditable, not vibes-based.
FaithfulnessReplayAudit log

Empath

Live Mandate Gate
25 tests

A live gate between intent and execution: Intent → read mandate → envelope → Execute / Ask / Narrow / Escalate / Refuse → hash-chained receipt. Four enforced rules — draft ≠ send, advice ≠ execution, irreversible → confirm, domain-trust isolation. Doctrine: every optimistic predicate defaults pessimistic until a named pipeline measures it. Stakes no longer acts alone.

Governance valueA concrete, testable answer to "how do you stop an agent from exceeding its authority?"
MandateReceiptsLeast-authority

Perla Sanctuary

Safety-Governed Crisis Companion
71 tests

A clinician-guided emotional-crisis companion built as a bridge, not an endpoint — it lowers the activation energy of reaching a human. Safety core: deterministic risk detector + response governor + escalation router (verified 9-8-8 / 9-1-1) + hash-chained session ledger + a clinician-protocol honesty gate that keeps everything in DRAFT until a named clinician signs off. The model is never the safety authority.

Governance valueHuman-in-the-loop, fail-loud, and honest about its own limits — safety as architecture.
Risk detectionEscalationLocal 3BHonesty gate

Veritas Clinical

Provenance for Liable Buyers
Live · Paid

A "chart-reality amplifier" for physicians — the physician is the pilot, the system is non-autonomous. Multi-format clinical ingest (FHIR R4 · HL7 v2 · C-CDA) + safety gate + RxNorm normalization + a licensed-provider contract that fails loud to human review rather than guessing. A provenance/temporal layer on top of a licensed authority — never a replacement for one.

Governance valueShows governance sold to the people actually liable for the decision.
FHIR/HL7/C-CDAAudit + sign-offStripe-live
Depth

Engineering Range

Governance is only credible if you can build the hard systems it governs. I do — from on-device ML to causal-graph market intelligence.

On-Device ML Systems

Auroch Local Model: a full MLX local-inference system (4-bit Qwen3 + retrieval router + eval harness), converted, QLoRA-tuned, and profiled on an M1 Pro under an 8 GB cap. Ported fully on-device to iOS as Prana — zero cloud, zero key, pure-Swift kernel.

Model Quantization

Auroch Quantum: role-aware mixed-precision compression for MLX — per-layer sensitivity, KV-quant, MoE residency/paging, and speculative decoding, proven live on OLMoE. An inspect → plan → build → eval → serve survivor loop, 79 tests.

Causal Intelligence

ORBIS War Room: a 26-node global causal graph whose edge weights are measured from ~2 years of return correlation — not asserted. Propagates a public shock across the graph, ranks exposed assets, and cites every link. Calibration once caught and fixed a bad proxy.

Proof of Execution

Shipped & Earning

Doctrine that doesn't ship is philosophy. These run in production behind real Stripe checkouts.

ORBIS

World Value & Intelligence Engine
Live · Paid

33 goods, value/rent/macro lenses, live market feed, cross-border CUSMA modeling, and a gated causal War Room — deployed at orbis.aurochthryx.com behind a recurring subscription. 154+ tests.

FastAPILive dataTwo revenue tiers

Dwell

Property Intelligence
Live · Paid

A "paid Zillow" — one truth engine across three surfaces (homeowner / agent / contractor), with a cited CLEAR / REVIEW / HIGH-RISK verdict and a contractor-quote x-ray. Live at dwell.aurochthryx.com. 86 tests.

Verdict engineTwo-rail pricingRender

Veritas API

Dead-Citation Checker
Live · Paid

The stripeable wedge: tenant-isolated API keys → per-customer ledger → quota → Stripe-gated status, with a self-serve signup loop built on stdlib alone. Invariant: no ledger until a key resolves to an active tenant. Live on Render. 38 tests.

Multi-tenantSelf-serveWebhook-verified

The Auroch Ecosystem

~55 components · 6 pillars
Indexed

The whole body of work is catalogued in a queryable SQLite index that walks the live filesystem for disk, git, and test facts — plus a shared commerce spine (one signed Passport cookie carries entitlements across every product). Systems thinking, not a folder of demos.

Catalog DBShared spineStdlib-first
What I bring to a team

Capabilities

Safety Architecture

Mandate gates, escalation routing, fail-loud defaults, and human-in-the-loop control designed into the system.

Provenance & Audit

Temporal memory, hash-chained ledgers, replayable decisions, and offline-verifiable receipts.

Evals & Faithfulness

LLM-as-judge that cites its evidence; calibration loops that catch their own flawed assumptions.

ML Engineering

On-device inference, QLoRA, role-aware quantization, and speculative decoding on real hardware.

Let's talk

Ready for AI governance & engineering

I want to work where the hard question is "how do we make this system accountable?" — and the answer has to compile. If that's your team, let's talk.