Local-first · 0.5B model + framework · M4 Mac Mini

A study in
multi-agent financial reasoning.

A 0.5B model. A local-first framework. Specialized agents that fit on a Mac Mini.

Model
Fin Nano · 0.5B
Framework
BigBugAI Fin · local-first
Agents
4 specialised
Sample reasoning trace — each row is one agent's structured output
trace · conflicting_signals · t_0001
01analystcontested_breakout · technical 0.71 · sentiment −0.34
02riskmax_drawdown 4.2% · stop −1.8σ · regime: late-cycle
03portfoliokelly_size 0.12 · concentration ok · liquidity ok
04executionopen 0.6× target, conditional add at retest
Sample trace · illustrativeevaluation pending
§ 01

Architecture

Four agents, each tuned to a single cognitive task. Each handoff is a typed envelope, not a paragraph.

Agent
Analyst
Model
Fin Nano (0.5B)
Output
AnalystReport
Role

Synthesises market state, indicators, and external signals into a structured read of the setup.

Tools
  • fetch_market_data
  • compute_indicator
  • query_news_sentiment
  • historical_analog_search
Why specialise?

Each handoff is auditable.

A monolithic prompt that tries to analyse, evaluate risk, size, and execute conflates four different cognitive tasks. Each task has its own evaluation criteria; collapsing them produces decisions that are hard to inspect and harder to improve.

Specialised agents isolate failure modes. An over-confident Analyst is easy to detect when its output passes through a Risk agent that disagrees on a measurable axis. A monolithic model hides the same disagreement inside a single chain of thought.

Per-agent evaluation becomes a tractable subproblem rather than a holistic judgement.

Message schema

Analyst → Risk

v1.0
{
  "from": "Analyst",
  "to": "Risk",
  "schemaVersion": "1.0",
  "payload": {
    "marketState": "contested_breakout",
    "technicalScore": 0.71,
    "sentimentScore": -0.34,
    "primaryDriver": "regulatory_overhang",
    "evidence": [
      { "tool": "fetch_market_data", "weight": 0.4 },
      { "tool": "query_news_sentiment", "weight": 0.6 }
    ]
  },
  "trace": {
    "agentTurnId": "t_0001",
    "promptCharsIn": 4218,
    "completionCharsOut": 1794,
    "toolCallCount": 3
  }
}
§ 02

Illustrative decision traces

Watch four agents reason through a representative scenario. Each turn streams thinking, tool calls, and structured output.

Scenario

Conflicting signals: technicals bullish, sentiment bearish

Mid-cap equity shows a textbook breakout on the daily chart while news flow is unambiguously negative following a regulatory inquiry.

§ 03

Evaluation

Held-out scenarios across multiple seeds. The framework is the variable, the model is the variable, and the contribution is the combination. Higher is better.

Benchmark numbers — coming soon

The evaluation harness is still being built. Final scores will land alongside the methodology in the project's research write-up. The table below shows the comparison surface we plan to publish, not measurements.

ScenarioFin Nano + frameworkFin Nano aloneClaude Opus 4.7 + frameworkGPT-5 + frameworkFrontier (no framework)
Conflicting signalstbdtbdtbdtbdtbd
Volatility spiketbdtbdtbdtbdtbd
Low-conviction setuptbdtbdtbdtbdtbd
Regime shift mid-tracetbdtbdtbdtbdtbd
Tool-output ambiguitytbdtbdtbdtbdtbd
Composite (held-out, n=240)tbdtbdtbdtbdtbd

Scenario list and comparison surface are illustrative of the planned harness. Numbers will publish alongside the methodology in the project's research write-up.

Anticipated failure modes

Categories where we expect the system to underperform its frontier baselines. Verified results will be reported with the evaluation.

  • Hypothesis
    Regime shifts within trace window

    When the underlying regime changes between Analyst and Execution, the system may rely on stale framing. A regime-detection step before re-entry could close some of this gap.

  • Hypothesis
    Tool output ambiguity

    Tool calls that return marginally usable data (sentiment scores in the noise band, low-volume indicators) can get over-weighted by downstream agents.

  • Hypothesis
    Under-specified scenarios

    When the scenario lacks a clean prior — no analog in the historical search, no consensus across indicators — the pipeline may over-reach rather than standing aside.

  • Hypothesis
    Calibration on conditional plans

    Conditional add/exit rules are harder to calibrate because their conversion rate in evaluation is harder to anchor than that of single-step decisions.

  • Hypothesis
    Schema brittleness in the small model

    A 0.5B model fine-tuned for schema fluency may produce valid JSON in unfamiliar scenarios while the underlying reasoning is shallow. The framework's typed contracts protect downstream agents from malformed input but cannot detect this kind of confident-but-wrong output. Detection requires either a reward model (out of scope for v0.1) or human spot-checking.

§ 04

Status

Fin Nano and the BigBugAI Fin framework are in active development. The website shows the design; the artifacts are forthcoming.

ComponentStatusTarget
Schema specin draftthis month
Fin Nano v0.1training pipeline ready2-3 weeks
BigBugAI Fin v0.1design locked6-8 weeks

Targets are estimates. The site will be updated as components ship.