Optima

The problem.

Small and medium AI/ML research teams operate under constant pressure to ship new models on tight compute budgets and shared hardware. Inefficient experimentation — rediscovering known dead ends, running redundant ablations, re-trying approaches that have already failed internally — burns iteration cycles and tens to hundreds of thousands of dollars of compute per quarter. The institutional knowledge of "what's already been tried" lives scattered across W&B runs, GitHub repos, Notion docs, Slack threads, and tribal memory ("I think we tried that?"). Newly-published research that would have saved a week of work goes unread.

## The solution. Optima is the experiment intelligence layer that lives in your terminal. You type a research question; a small team of Claude agents pulls the relevant published papers and your team's own past experiments, docs, and results, then returns one actionable recommendation — what to try next, why, a concrete experiment spec (model, method, key hyperparameters), a compute-cost estimate with savings vs. the naive approach, and per-claim confidence with citations.

## How it works.

A cheap Claude Haiku 4.5 intent pass routes the query and decides which evidence agents to run.
Two Claude Sonnet 4.6 agents run concurrently (via asyncio.gather):
- A Research Agent searches arXiv + the Semantic Scholar API with a curated local cache.
- A Context Agent searches the team's internal store of past enterprise-level experiments and documents using a compact cached index plus keyword + per-id lookup tools.
A Synthesis Agent (also Sonnet 4.6) is forced to emit a structured recommendation via tool use, with a code-enforced citation firewall: any citation pointing at evidence the agents didn't actually gather is dropped before reaching the user. A hallucinated reference cannot reach the output.
The result renders as: a Decision Summary, Ranked Evidence with clickable arXiv / Semantic Scholar links, an Experiment Spec with cost estimate, and Claims & Confidence tagged 🟢 High / 🟡 Medium / 🔴 Low.

What was built.

Five-agent Claude system — intent (Haiku), research (Sonnet), context (Sonnet), synthesis (Sonnet), ingest (Haiku) — sharing a single async tool-use loop with one cache_control prompt-cache breakpoint and blocking tool calls offloaded via asyncio.to_thread.
Code-enforced citation firewall that drops any reference pointing at an ID not actually in the gathered evidence —so a hallucinated paper or experiment ID can't reach the output.
- Live paper search with offline cache fallback — tries arXiv + Semantic Scholar, falls back to a curated local cache of ~24 real papers on any failure or 403 so the demo works anywhere.
- Canonical relationship-aware schema for experiments — including parent / related experiment IDs so the context agent traces lineages rather than treating history as a bag of disconnected runs.
- Haiku-powered CSV ingest that normalizes messy team CSVs (odd column names, free-text metrics, "55 dollars", mixed date formats) into the canonical schema via a forced tool call, validated by pydantic, idempotent on experiment_id.
- CLI surface: optima "", optima init (industry + API keys), optima ingest , optima status, optima experiments, optima papers.
- Onboarding flow with industry-tuned search injected at the paper-search wire level so it's guaranteed on every live AND cache query regardless of the agent's term choices.
- Terminal UX: live spinner with elapsed time fed by orchestrator phase callbacks (intent → evidence → synthesis), and a dynamic ASCII banner with a query-seeded SHA-256 sparkline fingerprint that's unique per run.
- 40 offline tests that pass with no API key and no network — mocked Anthropic client, simulated arXiv/S2 403s, citation-firewall verification, env-file upsert, every new command.

What I'd add next

-Have Mindfort's findings appear in Optima's context agent. This would streamline many of the work put towards security/compliance by ML teams. -Adding more migrations like KubeFlow, Azure ML, and Apache airflow.