Axion Engine: Adversarial R&D Operating System
Domain-agnostic R&D pipeline where three models attack each other's output across CS, clinical medicine, and IoT firmware.
Three-model adversarial pipeline across 3 domains (CS, clinical, IoT). 152 production sessions, zero fluff. Trade-off: 3x inference cost for cross-vendor debate.
The Problem
Single-model pipelines produce documentation that sounds authoritative but isn’t
Standard single-vendor pipelines default to safe, shallow output. For R&D documentation — distributed systems references, clinical research protocols, production firmware specs — shallow content actively misleads. One model can’t catch its own hallucinations, and confirmation bias compounds over hundreds of sections.
- Confirmation bias: same-vendor review agrees 2x more than cross-vendor review
- No domain isolation: one pipeline can’t serve CS, medicine, and IoT without rewriting
- Context amnesia: session 100 has no memory of what failed in sessions 1-99
- Zero quality gates: sending structurally broken output to expensive reviewers wastes tokens
The Architecture
Cross-vendor adversarial pipeline with deterministic linting and self-evolving intelligence
Axion Engine is a 5,859 LOC Python system that orchestrates three agents in an adversarial loop:
The Producer drafts deep technical material with extended reasoning and full context. The Skeptical CTO attacks the draft as a cynical staff engineer — finding hallucinations, missing mechanisms, and unsupported claims. The Reviewer sees both draft and critique, issuing ACCEPT/REVISE/REJECT verdicts.
A deterministic linter gate runs before stochastic review — Python AST validation, D2 diagram checks, caption enforcement, and constraint satisfaction. Structurally broken output is rejected instantly at zero cost, saving reviewer tokens.
Domain decoupling: one engine, three domains
The engine is domain-agnostic. Domain-specific behavior comes from YAML configs, prompt templates, and knowledge bases. Adding a new domain means adding a folder — not rewriting code.
| Domain | Quality Bar | Adversary Persona |
|---|---|---|
| Computer Science (distributed systems) | Kleppmann-level reference standard | Cynical distributed-systems CTO |
| Clinical Medicine | NEJM / Lancet Standard | Cynical journal editor |
| Production Firmware (IoT) | AWS Well-Architected | N/A (firmware export) |
Self-evolving intelligence: the Singularity Loop
After each adversarial loop, a Meta-Reflection stage analyzes recurring failures, hallucination patterns, and protocol gaps. Observations accumulate in three registries — Signal Tracker, Pattern Registry, and Trait Registry. Session 152 is measurably smarter than session 1 because the engine encodes 152 sessions of failure data.
Results
- 152 production sessions logged with full provenance chains
- 3 active domains from a single codebase with zero domain-specific engine code
- 6-stage adversarial pipeline per section (Producer → Linter → Draft → CTO → Reviewer → Meta-Reflection)
- 0% fluff score: tested against banned-word and specificity validators
- 348 structured documents and 61 D2 architectural diagrams across all domains
- Cross-vendor critique catches significantly more issues than single-vendor review — adversarial models surface blind spots a single vendor misses
- Linter gate rejects 15-20% of outputs before expensive reviewer calls
- Crash recovery via JSON session state — no lost work on API timeouts
Architecture Trade-offs
Cross-vendor adversarial review catches significantly more issues than single-vendor pipelines. Blind spots that Claude misses, Gemini finds — and vice versa.
3x inference cost per section. Three model calls (Producer + CTO + Reviewer) instead of one. Accepted because the alternative — human subject-matter review — costs 100x more and takes days instead of minutes.
Deterministic linter gate rejects 15-20% of outputs at zero cost. AST validation, D2 diagram checks, and constraint satisfaction catch structural failures before expensive reviewer inference.
Rigid format constraints limit creative output. The linter enforces section structure, citation format, and diagram presence. For R&D documentation this is a feature. For creative writing it would be a liability.
Similar Case Studies
Related Articles
Deploy this architecture
Submit your requirements. We'll review your constraints, identify bottlenecks, and scope the path to production.
[ SUBMIT SPECS ]No SDRs. A Principal Engineer reviews every submission.
From the team behind Production-Ready AI Agents (Amazon, 2025)