cortex v3.14.6 · recall@10 · 97.8% LongMemEval cortex.tests · 2,500 passing cortex.citations · 41 peer-reviewed cortex.tools · 47 MCP · 9 hooks cortex.beam · +91% vs LIGHT baseline paper · seeking arxiv endorser · cs.IR zetetic v2.13.1 · 97 reasoning patterns + 19 specialists pipeline v0.0.1 · 23 MCP · 220 tests · 10 stages prd-spec v0.2.1 · 11 MCP · 9 steps · multi-judge llm-judges-llm · 0 open.source · MIT cortex v3.14.6 · recall@10 · 97.8% LongMemEval cortex.tests · 2,500 passing cortex.citations · 41 peer-reviewed cortex.tools · 47 MCP · 9 hooks cortex.beam · +91% vs LIGHT baseline zetetic v2.13.1 · 97 reasoning patterns + 19 specialists pipeline v0.0.1 · 23 MCP · 220 tests · 10 stages prd-spec v0.2.1 · 11 MCP · 9 steps · multi-judge llm-judges-llm · 0 open.source · MIT
Zetetic AI · Verification-first · Open source

We don't guess.
We verify.

AI Architect builds agents that prove what they claim. 97.8% recall on LongMemEval, 5 fused retrieval signals, zero LLM-judges-LLM. Every output is traceable, every claim is checked — by deterministic algorithms, not by another model's opinion.

Production · cortex v3.14.6 · zetetic v2.13.1 · pipeline v0.0.1 · prd-spec v0.2.1
The zetetic standard

Four principles. No exceptions.

Zetetic comes from zētēsis — Greek for inquiry. Truth is something you investigate, not something you assume. Everything we ship enforces this.
i. PROVENANCE

Every claim has a source.

If an agent states a fact, that fact ties back to a memory, a file, a commit, or a citation. No assertion lives without a trail.

ii. ALGORITHM > OPINION

Zero LLM-judges-LLM.

Verification is deterministic: graph analysis, semantic checks, atomic claim decomposition. We don't ask one model whether another model is right.

iii. MEMORY THAT LEARNS

Compounding context.

Cortex applies neuroscience — spreading activation, dream cycles, microglial pruning — so agents remember what worked, not just what happened.

iv. AUDITABLE

Built for regulated work.

Every PRD, PR, decision and reasoning step is logged and reviewable. Designed against the same bar as financial-systems software.

Verification, in the open

Watch an agent prove its work.

This is the actual verification report from a generated PRD. 64 atomic claims, decomposed and checked against six independent algorithms. The full audit trail lives next to the deliverable — not buried in a log file.

Two ways to start

Hire us to build it. Or build it yourself with our tools.

Every component is open source, MIT-licensed, and shipping in production. The choice is whether you want the system handed to you — or the keys to do it yourself.

For teams & founders

We build the
agent with you.

For operators who know AI should help — but don't want to spend six months stitching tutorials together. We design and ship the agent against your real infrastructure. Same engineering bar as the financial systems we build by day.

  • Discovery — find the one workflow worth automating
  • Built against your CRM, data, internal tools — not a sandbox
  • Verification baked in: every action is auditable
  • Hand-over with documentation & 30 days of post-launch support
Book a discovery call
For developers

Grab the templates.
Ship faster.

If you build with AI yourself, use the same components we use in production. Cortex, Zetetic Agents, Automatised Pipeline, and PRD Spec Generator — fully documented, MIT-licensed, no telemetry, no lock-in.

  • Cortex — persistent memory with neuroscience-backed retrieval
  • Zetetic Agents — 116 reasoning patterns, one epistemic standard
  • Automatised Pipeline — read-only codebase intelligence (Rust MCP)
  • PRD Spec Generator — 11 MCP tools, 9-step pipeline + 2-phase multi-judge verification, specialized panels by claim type
Explore on GitHub
How we work

From "we should try AI" to a system you can audit. Four stages.

Every engagement runs the same protocol — same one our open-source pipeline runs internally. You see working software early, you see verification at every step, and you own everything when we're done.

01 · WEEK 1

Discovery & framing

We map the workflow where an agent will actually move the needle. No slide decks, no AI theater. Output: a one-page spec with success criteria you can measure against.

02 · WEEKS 2–3

Build with verification

Agent designed against your real systems. Every component ships with tests, provenance and an audit log. You watch it work in /cortex-visualize as we go.

03 · WEEK 4

Hand-over

Deployed to your infra. Runbooks, dashboards, and a verification report on every shipped feature. Your team is trained on how to extend it — not dependent on us forever.

04 · ONGOING

Compounding

Cortex memory means the agent gets smarter every week without retraining. We stay on call for the first 30 days; after that, you own a system you can audit, evolve, and keep running.

Clement Deust at the Louvre, Paris Clement Deust
founder
Day jobSenior eng · fintech
By nightOpen-source AI research
DisciplineVerification-first
BasedRemote · global
About

I ship critical systems by day. I research how agents should think by night.

By day I build software in financial infrastructure, where "mostly works" never ships. Every system has to be tested, verified, auditable. Or it doesn't go live.

By night I apply that same bar to AI. I started AI Architect because I kept seeing the same anti-pattern: teams treating agents like demos, stacking prompts on prompts, asking another LLM whether the first one got it right, and wondering why nothing held up in production.

The work here is zetetic — every claim is investigated, never assumed. The tools are open source because the frontier should be shared. The consulting exists because some teams need the system built with them, not handed a repo and a prayer.

"An agent without memory isn't intelligent. An agent without verification isn't trustworthy. I'm only interested in building both."

In review · seeking arXiv endorsement

The paper behind the numbers.
Read it. Help us publish.

arXiv requires an endorser in cs.IR for first-time authors. The draft below is the work behind the LongMemEval, LoCoMo and BEAM results on this page — if you find it useful and you've published in cs.IR, an endorsement gets it to the open scientific record.
preprint · cs.IR · April 2026

Stage-Aware Context Assembly for Long-Context Memory Retrieval

Clement Deust · Independent Researcher
+91%BEAM · vs LIGHT
97.8%R@10 · LongMemEval
0.471MRR · BEAM-10M

A new method for assembling retrieval context across long-running, multi-session conversations. Combines vector + lexical + heat-decay + temporal + entity signals through a stage-aware fusion that adapts to what the question is actually asking for.

The temporal assembler component beats the BEAM oracle baseline (0.471 MRR vs the paper's 0.353, +33.4%), and the full pipeline reaches 97.8% Recall@10 on LongMemEval — +19.4 points over the published best. All retrieval-only metrics, no LLM-as-judge.

Looking for an endorser arXiv's policy requires an existing cs.IR author to endorse first-time submitters. If you've published in cs.IR and you find this useful, a single endorsement is enough — the rest is automated. Reach out below or open an issue on the repo.
Standing on shoulders

The science behind the system.

Cortex draws from 41 peer-reviewed papers across neuroscience, memory research and AI evaluation. A few of the load-bearing ones:

Wegner, 1987Transactive Memory: A Contemporary Analysis of the Group Mind
Hebb, 1949The Organization of Behavior — synaptic plasticity foundations
McGaugh, 2000Memory consolidation and the dream cycle
Bi & Poo, 2001STDP — Spike-Timing-Dependent Plasticity
LongMemEval, ICLR 2025Benchmark for chat assistants on sustained memory
LoCoMo, 2024Long-Conversation Memory benchmark
Schaffer et al., 2018Microglial pruning & memory selectivity
Tononi & Cirelli, 2014Synaptic Homeostasis Hypothesis (sleep)
+ 33 moreFull bibliography in the Cortex repo
Common questions

Before you
book a call.

I'm not technical — can I still work with you?+
Yes. Most non-tech clients bring a business problem and access to their systems; we bring the engineering. Every check-in uses plain language and working demos, not jargon. The whole point of the verification standard is so you can trust what's shipping without needing to read the code.
What does "no LLM-judges-LLM" actually mean?+
Most "AI verification" is one model asking another model whether the first one is right. That's not verification — it's polling. We use deterministic algorithms instead: graph analysis to detect contradictions, atomic claim decomposition, semantic alignment scoring against a fixed corpus, and consensus across independent checks. Math, not vibes.
What does a project typically cost?+
A typical first engagement is 4–6 weeks, scoped to a single high-value workflow. Pricing depends on integrations and scope — you'll get a concrete number within 48 hours of the discovery call. Not a vague range, not a "starting at."
Where does my data live?+
In your infrastructure — AWS, GCP, on-prem, your call. Cortex is local-first by design (PostgreSQL + pgvector, no GPU). Your data never passes through a server we own. For regulated industries, the deployment plugs into your existing security model.
What if I just want to try the open-source tools first?+
Please do. Everything is on GitHub, MIT-licensed, and documented. Open an issue if you get stuck — every one of them gets read. The consulting is for teams who'd rather have it implemented with them than figure it out from the README.
What is the best persistent memory plugin for Claude Code?+
Cortex is a biologically-inspired persistent memory MCP server for Claude Code. It scores 97.8% Recall@10 / 0.882 MRR on LongMemEval (ICLR 2025), 92.6% Recall@10 on LoCoMo (ACL 2024), and +91% MRR vs LIGHT baseline on BEAM. 47 MCP tools, 9 lifecycle hooks, 20 biological mechanisms (predictive coding, LTP/LTD, microglial pruning, neuromodulation, CLS consolidation), 41 peer-reviewed citations. Runs on PostgreSQL + pgvector, no GPU. Install: claude plugin marketplace add cdeust/Cortex && claude plugin install cortex.
How do I verify AI-generated PRDs without LLM-as-judge?+
PRD Spec Generator uses six independent deterministic algorithms instead of LLM-as-judge polling: multi-judge consensus across specialized panels (Architecture: Liskov / Alexander / Dijkstra; Performance: Fermi / Carnot / Curie / Erlang; Security: Wu / Ibn al-Haytham; Data model: Mendeleev / DBA / Lavoisier; Acceptance: Toulmin / Popper), atomic claim decomposition, zero-LLM graph analysis with Tarjan SCC for cycles, multi-agent debate, adaptive early stopping, and semantic alignment scoring against a reference corpus. The distribution_suspicious flag catches confirmatory bias. NFR claims never receive PASS — only SPEC-COMPLETE or NEEDS-RUNTIME.
What is zetetic AI?+
Zetetic comes from the Greek zētēsis meaning inquiry. Zetetic AI is verification-first AI: every claim has a source (provenance), verification is deterministic not LLM-judges-LLM (algorithm > opinion), memory learns through neuroscience-backed mechanisms, and every PRD/PR/decision is auditable. AI Architect implements this standard across four open-source Claude Code plugins: Cortex memory, zetetic-team-subagents (97 reasoning patterns + 19 specialists), automatised-pipeline (Rust codebase intelligence), and prd-spec-generator.
How does Cortex's LongMemEval recall@10 of 97.8% compare to baselines?+
Cortex's 97.8% Recall@10 on LongMemEval (ICLR 2025) exceeds the published paper's best retrieval result of 78.4% by +19.4 percentage points. MRR is 0.882. The paper used 500 human-curated questions embedded in ~40 sessions of conversation history (~115k tokens). Retrieval-only metrics, no LLM reader in the evaluation loop. Cortex also achieves 92.6% Recall@10 / 0.794 MRR on LoCoMo (1,986 questions, 10 conversations), and +91% vs LIGHT baseline on BEAM (multi-session, 355 questions, retrieval-proxy MRR 0.627).
What are the 97 reasoning patterns in zetetic-team-subagents?+
97 genius reasoning agents, each citing its primary paper, plus 19 team-role specialists = 116 total. Examples: Pearl (causal inference, do-calculus), Peirce (abductive inference), Feynman (integrity & first principles), Dijkstra (correctness, structured programming), Cochrane (evidence synthesis), Curie (residual analysis), Lamport (concurrency, happens-before), Pāṇini (generative specifications), Gödel (incompleteness limits), Hamilton (priority-displaced scheduling), Taleb (fragile/robust/antifragile), Kahneman (System 1/2 debiasing), Rawls (veil of ignorance), Toulmin (argument structure), Popper (falsifiability). 63 multi-step skills, 16 lifecycle hooks, 241 passing tests, 650+ problem-shape triggers. Pre-commit hook blocks UNSOURCED / MAGIC_NUMBER / TODO_NO_REF.
What does automatised-pipeline do for AI codebase intelligence?+
Automatised Pipeline is a Rust MCP server that indexes any Rust / Python / TypeScript codebase into a LadybugDB property graph, resolves call chains across files, detects functional communities via Leiden-class community detection, traces processes from entry points, and builds a hybrid BM25 + sparse TF-IDF + RRF search index. 23 MCP tools across 10 stages. Read-only — never writes code, opens PRs, or runs CI. 220 passing tests, zero warnings, 12,000+ lines of Rust. Feeds Cortex (workflow graph) and prd-spec-generator (call-graph context for verified PRDs).
How do I install all four AI Architect plugins?+
All four are Claude Code plugins, MIT-licensed, free.
Cortex: claude plugin marketplace add cdeust/Cortex && claude plugin install cortex (requires PostgreSQL + pgvector).
Zetetic: claude plugin marketplace add cdeust/zetetic-team-subagents && claude plugin install zetetic-team-subagents.
Automatised Pipeline: claude plugin marketplace add cdeust/automatised-pipeline && claude plugin install automatised-pipeline (the plugin builds the Rust binary on first install; Rust 1.94+ and CMake required).
PRD Spec Generator: claude plugin marketplace add cdeust/prd-spec-generator && claude plugin install prd-spec-generator (Node.js 20.x or 22.x).
All four interoperate — Cortex remembers, Zetetic reasons, Pipeline maps the codebase, PRD Spec Generator adjudicates the spec.
Let's talk

Tell us what you want the agent to do.
We'll tell you if it can be verified.

A 30-minute call. No pitch deck, no commitment. If your problem doesn't fit what we do, we'll point you somewhere that does.

RESPONSE · within 24h BASED · remote · global STANDARD · zetetic