ClaudeGeminiMulti-Agent OrchestrationAdversarial ReviewPython

Axion Engine: Adversarial R&D Operating System

Domain-agnostic R&D pipeline where three models attack each other's output across CS, clinical medicine, and IoT firmware.

Bottom Line

Three-model adversarial pipeline across 3 domains (CS, clinical, IoT). 152 production sessions, zero fluff. Trade-off: 3x inference cost for cross-vendor debate.

// system_metrics

production_sessions: 152

active_domains: 3

adversarial_stages: 6

fluff_score: 0%

Patterns Applied

Adversarial Pipeline Stochastic Gap Cognitive Firewall

The Problem

Single-model pipelines produce documentation that sounds authoritative but isn’t

Standard single-vendor pipelines default to safe, shallow output. For R&D documentation — distributed systems references, clinical research protocols, production firmware specs — shallow content actively misleads. One model can’t catch its own hallucinations, and confirmation bias compounds over hundreds of sections.

Confirmation bias: same-vendor review agrees 2x more than cross-vendor review
No domain isolation: one pipeline can’t serve CS, medicine, and IoT without rewriting
Context amnesia: session 100 has no memory of what failed in sessions 1-99
Zero quality gates: sending structurally broken output to expensive reviewers wastes tokens

The Architecture

Axion Engine adversarial pipeline — Producer draft through AST linter, Skeptical CTO critique, Reviewer verdict, and meta-reflection feedback loop — Fig 1 — Adversarial review pipeline with meta-reflection

Cross-vendor adversarial pipeline with deterministic linting and self-evolving intelligence

Axion Engine is a 5,859 LOC Python system that orchestrates three agents in an adversarial loop:

The Producer drafts deep technical material with extended reasoning and full context. The Skeptical CTO attacks the draft as a cynical staff engineer — finding hallucinations, missing mechanisms, and unsupported claims. The Reviewer sees both draft and critique, issuing ACCEPT/REVISE/REJECT verdicts.

A deterministic linter gate runs before stochastic review — Python AST validation, D2 diagram checks, caption enforcement, and constraint satisfaction. Structurally broken output is rejected instantly at zero cost, saving reviewer tokens.

Domain decoupling: one engine, three domains

The engine is domain-agnostic. Domain-specific behavior comes from YAML configs, prompt templates, and knowledge bases. Adding a new domain means adding a folder — not rewriting code.

Domain	Quality Bar	Adversary Persona
Computer Science (distributed systems)	Kleppmann-level reference standard	Cynical distributed-systems CTO
Clinical Medicine	NEJM / Lancet Standard	Cynical journal editor
Production Firmware (IoT)	AWS Well-Architected	N/A (firmware export)

Self-evolving intelligence: the Singularity Loop

After each adversarial loop, a Meta-Reflection stage analyzes recurring failures, hallucination patterns, and protocol gaps. Observations accumulate in three registries — Signal Tracker, Pattern Registry, and Trait Registry. Session 152 is measurably smarter than session 1 because the engine encodes 152 sessions of failure data.

Results

152 production sessions logged with full provenance chains
3 active domains from a single codebase with zero domain-specific engine code
6-stage adversarial pipeline per section (Producer → Linter → Draft → CTO → Reviewer → Meta-Reflection)
0% fluff score: tested against banned-word and specificity validators
348 structured documents and 61 D2 architectural diagrams across all domains
Cross-vendor critique catches significantly more issues than single-vendor review — adversarial models surface blind spots a single vendor misses
Linter gate rejects 15-20% of outputs before expensive reviewer calls
Crash recovery via JSON session state — no lost work on API timeouts

Architecture Trade-offs

Gain

Cross-vendor adversarial review catches significantly more issues than single-vendor pipelines. Blind spots that Claude misses, Gemini finds — and vice versa.

Cost

3x inference cost per section. Three model calls (Producer + CTO + Reviewer) instead of one. Accepted because the alternative — human subject-matter review — costs 100x more and takes days instead of minutes.

Gain

Deterministic linter gate rejects 15-20% of outputs at zero cost. AST validation, D2 diagram checks, and constraint satisfaction catch structural failures before expensive reviewer inference.

Cost

Rigid format constraints limit creative output. The linter enforces section structure, citation format, and diagram presence. For R&D documentation this is a feature. For creative writing it would be a liability.

Technology Stack

What we built with

ClaudeGeminiMulti-Agent OrchestrationAdversarial ReviewPython

Related Work

Similar Case Studies

View all →

View all case studies →

Engineering Intelligence

Deploy this architecture

Submit your requirements. We'll review your constraints, identify bottlenecks, and scope the path to production.

[ SUBMIT SPECS ]

No SDRs. A Principal Engineer reviews every submission.

From the team behind Production-Ready AI Agents (Amazon, 2025)

Axion Engine: Adversarial R&D Operating System

What we built with

Similar Case Studies

Codebase Analysis Agent: 30 Seconds to First Answer

Competitor Intelligence Agent: 8 Hours to 5 Minutes

Aporia: Modular OSINT Engine for Security Research

Related Articles

Python vs R vs Scala for Data Science: Library Comparison

HITL Engineering Patterns: Implementing LangGraph Interrupts for Production Approval Workflows

Top 20 R Libraries for Data Science [Infographic]

Deploy this architecture