Real-time anomaly detection processing 2.4M events/day with 70% fewer false positives
How we built a real-time anomaly detection pipeline processing 2.4M events/day using Kafka, Isolation Forest, and foundation models. False positive rate reduced from 68% to under 20%.
ML ensemble on Kafka reduced false positives from 68% to under 20% at 2.4M events/day. Trade-off accepted: 340ms added latency per event for multi-model scoring.
The Problem
Batch processing missed anomalies by 6-8 hours
The existing anomaly detection system ran nightly batch jobs against a PostgreSQL data warehouse. By the time an alert fired, the billing irregularity or access violation had been in production for 6-8 hours — long enough for cascading damage.
The false positive rate was the bigger problem. At 68% false positives, the compliance team had stopped trusting the system entirely. They were manually reviewing every alert, which meant the real anomalies were buried in noise.
- 6-8 hour detection lag: batch processing ran overnight, alerts arrived the next morning
- 68% false positive rate: compliance team ignored most alerts
- No cross-facility correlation: each facility’s data was siloed in separate databases
- Static thresholds: hand-tuned rules that hadn’t been updated in 18 months
- Zero contextual understanding: no way to distinguish seasonal patterns from real anomalies
Our Approach
Event-driven pipeline with behavioral baselines
We replaced the batch architecture with an event-driven pipeline built on Apache Kafka. Every transaction, access log entry, and prescription event streams through a unified topic structure. The detection engine processes each event within 200ms of ingestion.
The core insight: static thresholds fail because “normal” changes. A physician prescribing 40 opioid prescriptions per month might be anomalous in a rural clinic but expected in a pain management center. We built behavioral baselines per entity (physician, facility, department) using Isolation Forest, then layered foundation model reasoning for contextual interpretation.
The Architecture
Three-layer detection with foundation model reasoning
Layer 1: Streaming ingestion and enrichment
Kafka Connect pulls from 14 EMR systems via CDC (Change Data Capture). A Kafka Streams application handles deduplication, schema normalization, and entity enrichment — joining transaction events with physician profiles, facility metadata, and historical baselines from Redis.
Layer 2: Isolation Forest anomaly scoring
Each enriched event passes through an Isolation Forest model trained on 90 days of facility-specific data. The model produces an anomaly score (0-1) based on 23 features including transaction amount deviation, prescription frequency, access time patterns, and cross-facility velocity checks.
Events scoring above 0.7 are flagged for contextual review. Events above 0.9 trigger immediate alerts regardless of context.
Layer 3: Foundation model contextual reasoning
Flagged events (score 0.7-0.9) pass to a foundation model that receives the anomaly score, the entity’s behavioral baseline, and a structured context window of recent activity. The model determines whether the anomaly is expected variance (flu season spike, new physician onboarding) or genuine concern (credential sharing, prescription splitting).
This layer reduced false positives from 68% to under 20% compared to threshold-only detection. The model doesn’t make final decisions — it enriches the alert with reasoning that the compliance team reviews.
Results
Before and after comparison
Before:
- Nightly batch processing (6-8 hour delay)
- 68% false positive rate on alerts
- Siloed databases per facility (14 separate systems)
- Static thresholds hand-tuned 18 months ago
- No cross-facility correlation or velocity checks
- Compliance team manually reviews every alert
- Zero contextual reasoning on flagged events
After:
- Real-time streaming (<200ms detection latency)
- False positive rate reduced from 68% to under 20%
- Unified event stream across all 14 facilities
- Behavioral baselines that adapt per entity
- Cross-facility velocity detection in real-time
- Priority-ranked alerts with FM reasoning context
- 73% improvement in alert accuracy
Architecture Trade-offs
False positives dropped from 68% to under 20%. Foundation model contextual reasoning distinguishes seasonal variance from genuine anomalies — compliance team trusts alerts again.
340ms added latency per flagged event. Foundation model inference adds processing time on the 0.7-0.9 anomaly band. Accepted because sub-second latency is sufficient for compliance review workflows — this is not a trading system.
Cross-facility velocity detection in real-time. Unified Kafka topic structure enables credential-sharing and prescription-splitting detection across all 14 facilities simultaneously.
Redis memory footprint: 12 GB for behavioral baselines. Per-entity baselines across 14 facilities require dedicated Redis cluster. Worth it: in-memory lookup keeps the p99 under 200ms.
Key Learnings
Engineering decisions that shaped the outcome
- Isolation Forest over autoencoders: 10x faster inference at marginal accuracy trade-off. At 2.4M events/day, inference latency was the constraint, not model sophistication
- Foundation models for reasoning, not detection: using LLMs for scoring is cost-prohibitive at this volume. We use them only for the ~3% of events that need contextual interpretation
- Per-entity baselines over global models: a single facility-wide threshold produced the 68% false positive rate. Entity-level baselines cut it to under 20%
- Redis feature store over batch lookups: the enrichment step requires sub-10ms entity profile retrieval. PostgreSQL couldn’t keep up under load
- Kafka topic-per-facility, not topic-per-event-type: enables facility-level scaling and isolation without consumer group complexity
Related Capabilities
This engagement drew on multiple practice areas:
- ML & Data Science — Isolation Forest model development and training pipeline
- MLOps Engineering — model serving, feature store (Redis), and monitoring infrastructure
- Kafka Engineering — CDC ingestion, streaming architecture, and topic design
- Data Engineering — unified event pipeline across 14 facility systems
What we built with
Similar Case Studies
Related Articles
Deploy this architecture
Submit your requirements. We'll review your constraints, identify bottlenecks, and scope the path to production.
[ SUBMIT SPECS ]No SDRs. A Principal Engineer reviews every submission.
From the team behind Production-Ready AI Agents (Amazon, 2025)