Anthony D. Maio
Staff AI Platform Engineer | LLM Infrastructure & Reliability | Agent Systems & Safety
20 years building production systems across fintech, security, and identity. Now applying that same discipline to LLM infrastructure, agent orchestration, and AI safety—treating oversight as a systems engineering problem, not a policy exercise.
Making Minds is my AI research lab and consultancy—delivering production tooling, open-source models, and peer-reviewed research to clients ranging from early-stage startups to mid-sized industrial organizations.
Agentic AI architectures · Multi-agent coordination · AI coherence & memory · Epistemic stress detection · AI introspection · Mechanistic interpretability
Flagship
Production tools and models—shipped, installable, used.
Substack
Long-form AI analysis, technical walkthroughs, and The Checkpoint newsletter
Deep dives on AI safety, agentic architectures, and the systems that power production AI. From live-blogging an OpenAI competition to dissecting Palantir's military AI.
mnemos
Biomimetic memory for coding agents
Five neuroscience-inspired memory modules — surprisal gating, mutable RAG, affective routing, sleep consolidation, spreading activation — as composable building blocks for LLM agents.
Cartograph
Map the repo before you burn context
CLI-first repo analysis. Rank files, trace dependency hubs, pull task-scoped context, and hand structured artifacts to Claude Code, OpenClaw, or any agent.
Slipstream
60–85% token reduction for multi-agent coordination
A semantic quantization protocol that compresses inter-agent communication while preserving meaning. Includes trained LoRA adapters, PyPI package, and Ollama model.
Eve-2
272M-parameter Mixture-of-Experts, trained from scratch
Base model pretrained on ~10.5B tokens (FineWeb-edu) using PyTorch DDP, plus instruction-tuned and task-specialist derivatives optimized for CPU/edge inference.
Eve-3
SABER — Slip-Anchors, Experience Streams, and Re-entry
Next-generation cognitive architecture building on Eve-2's MoE foundation. SABER adds persistent slip-anchors for error correction, experience streams for continual learning, and re-entry loops for self-monitoring.
CoDA-GQA-L
9.5x KV cache compression with 2 custom Triton kernels
Bounded-memory differential attention compresses the KV cache from O(n) to a fixed 218 KB per layer. Retains 100% needle-in-haystack retrieval at 16K tokens on Mistral-7B.
Synthesis
Federated skill ecosystem for safe AI self-extension
A capability marketplace where agents discover, compose, and publish skills through TDD gates and graduated trust. Composition-over-creation keeps self-extension safe and auditable.
JSON Tokenizer
Structure-aware tokenization — stop wasting tokens on JSON grammar
Assigns dedicated single tokens to JSON grammar elements and learns compact key vocabularies, achieving 5-15% token savings with a vocabulary ~90x smaller than cl100k_base.
Parameter Golf
Matched SOTA in OpenAI's Model Craft Challenge
Trained the best 16MB language model in 10 minutes on 8xH100s. Reached 1.1234 bpb using a model council of 5 frontier LLMs, custom Triton kernels, and FlashAttention-3 Hopper builds.
Procrustes Bridge
Do LLMs share the same internal geometry?
Learns orthogonal rotations between LLM hidden-state spaces via SVD-based Procrustes alignment. Tests whether one model's internal state can decode tokens through another model's output head.
Research
Papers organized by theme.
Scalable AI Oversight
How do we verify AI outputs when the verifier is weaker than the system it checks?
- From Verification Failure to Swarm Solution Measuring where AI oversight breaks down, with an ensemble swarm fix
- CMED Benchmark When weak verifiers miss deceptive reasoning in stronger models
- HDCS Architecture Diverse weak models for scalable oversight via error decorrelation
- Model Organisms of Supply-Chain Co-option Living-off-the-land failure modes in RAG-augmented agent runtimes
Multi-Agent Coordination
Efficient, safe communication protocols for agent swarms.
- Slipstream: Semantic Quantization Protocol 60–85% token reduction for multi-agent coordination
- Covert Channel Prevention RL-based governance for safe inter-agent communication
- Structure-Aware Tokenization for JSON 5-15% token savings on schema-repetitive agentic workloads with ~90x smaller vocabulary
- Parameter Golf: Model Council Strategy Using 5 frontier LLMs as strategic advisors to match SOTA in OpenAI's 16MB competition
Cognitive Architectures
Building minds that persist, learn, stay coherent, and extend their own capabilities safely.
- Coherence-Seeking Architectures MRA + C2 + CPR unified framework for long-lived agents
- The Continuity Core Persistent memory and intrinsic motivation for self-modifying AI
- Self-Directed Knowledge Acquisition Autonomous knowledge gap identification without weight updates
- Synthesis: Federated Capability Ecosystem Safe AI self-extension through TDD and graduated trust
- Eve: From-Scratch Transformer Models Eve-2 MoE (272M) and Eve-3 SABER (1B) with novel cognitive components
- Procrustes Bridge Cross-model representation alignment via orthogonal rotation
AI Safety & Alignment
Understanding failure modes — sycophancy, hallucination, and the gap between behavioral and mechanistic safety.
- Safety Lens: White-Box Alignment Detection MRI-style introspection via Persona Vector Extraction across 8 transformer architectures
- Epistemic Dissonance Sycophantic hallucination as structural conflict, not knowledge failure
- Scaffolded Introspection Eliciting and measuring self-referential behavior in LLMs
Applied AI for Industry
AI deployment guide for industries that build, move, and power the world—where reliability, safety, and ROI are non-negotiable.
Writing
Long-form analysis, technical walkthroughs, and opinion across Substack, Medium, and Hugging Face.
Ten Agents Destroyed Production and Everyone is Strangely OK With It
When the agentic promise meets operational reality
OpenAI's Parameter Golf Day 7: Sub-1.0
Breaking the 1.0 bpb barrier in the 16MB language model competition
Parameter Golf Day 6: The Pod Lottery
When GPU infrastructure becomes the bottleneck
Parameter Golf Day 5: 157 Kilobytes
Every byte matters when your model cap is 16MB
Sixty Thousand Kernels
Building FlashAttention-3 from source on RunPod
Live Blogging the OpenAI Parameter Golf Challenge
Real-time dispatches from a 16MB language model competition
The 80/20 Lie: Why 80% of Agentic AI Work Isn't AI
The infrastructure reality behind agentic systems
Boots on the Ground AI: Eve 3
The SABER architecture — Slip-Anchors, Experience Streams, and Re-entry
Getting Started with NemoClaw on Windows (WSL2)
A practical guide to NVIDIA's sandboxed AI coding agent
Your Model Doesn't Need to Re-Read the Document
Introducing Stateful Neural Databases
The Recursive Developer
How agentic masters justify $2,000+/mo in coding assistants
Your Agent Has Amnesia
Announcing mnemos: biomimetic memory for agents
How to Actually Code With Agents
The velocity trap and the practices that survive it
A $1.5M Company Just Did What Used to Require the CIA
AI closes the gap between observation and finished intelligence
Agentic Development Workflows
What is happening in enterprise prod right now
Inside Maven, Palantir's Military Brain Built on Claude
How an AI safety company's tech ended up selecting bombing targets
Structure-Aware Tokenization for JSON
Stop wasting tokens on predictable JSON structure
Read the Contract, Not the Press Release
What OpenAI's Pentagon deal actually says
From Theoretical Exploit to Counterterrorism Tool
When research meets real-world impact
CoDA-GQA-L: 9.5x KV Cache Compression
Technical deep-dive on bounded-memory differential attention
From 'We Need AI' to 'We Ship AI'
Bridging the gap from ambition to deployment
Medium → all posts
- The Agentic Coding Shift 5 counter-intuitive truths about building AI systems
- The REKKI Case Study Becoming an AI-first organization
- Llama 4 Running Locally Local deployment in under an hour
Hugging Face → profile
- Slipstream for Agent Communication Technical deep-dive on semantic quantization
- Model Organism for Supply-Chain Co-option Forensic LotL case study in agentic runtimes
The Checkpoint Newsletter
Weekly roundup of developments that matter if you build, deploy, or think critically about AI systems.
- March 21, 2026 — OpenAI acquires Promptfoo, Knuth uses Claude, Mistral Small 4
- March 20, 2026 — Mistral Small 4, GPT-5.4 Mini/Nano, and the week in releases
- March 5, 2026 — Data and compute are the new currencies of power
Glossary
Safety & Oversight
- HDCS
- — Heterogeneous Divergence-Convergence Swarm. Ensemble of diverse AI models that cross-check each other.
- CMED
- — Cross-Model Epistemic Divergence. Test suite for revealing AI verification blind spots.
- EAP
- — Evolutionary Adversarial Pipeline. Automated red-teaming that evolves prompts to find safety blind spots.
- LotL
- — Living-off-the-Land. Repurposing legitimate tools for unintended goals.
Architectures
- MRA
- — Manifold Resonance Architecture. Detects epistemic stress before generating answers.
- CPR
- — Collaborative Partner Reasoning. Separates exploratory reasoning from final answers.
- C2
- — Continuity Core. Layered memory giving stateless AI persistent context.
- UCR
- — Universal Concept Reference. Compact semantic anchors for 82% fewer tokens.
- SABER
- — Slip-Anchors, Experience Streams, and Re-entry. Cognitive architecture with learnable error-correction codebooks, per-token state flow, and resonant FFN layers.
- CoDA
- — Constrained Orthogonal Differential Attention. Sharpens attention by subtracting a gated inhibitory stream via learnable rotation.