Skip to content

RUNTIME: default-route large tool outputs through the workshop (protect parent context) #548

@Hmbown

Description

@Hmbown

Thesis

The parent model's context is precious — every token in it should earn its place. Today, large tool outputs (file reads, grep haystacks, web fetches, archive search hits, command output) get truncated to fit and dumped into the parent's context, polluting the cache prefix and burning the most expensive token budget on raw haystacks. Per MGH (Zhang 2026): decomposition is the leverage. Default-route any tool output above a size threshold through the workshop (#547). The workshop processes with V4-Flash, returns a synthesis. The parent only ever sees the answer.

Generalizes #511 (currently scoped to sub-agent output) to any tool output. Closes the loop with #547 — the workshop becomes the substrate, this is the policy.

Current behavior

  • read_file returns up to a hard cap; large files dumped raw into parent context.
  • grep_files returns top-N hits; truncates with marker; raw text into parent.
  • recall_archive BM25 over JSONL cycles; raw hits into parent.
  • web_search / fetch_url results dumped raw.
  • exec_shell output capped; raw into parent.
  • agent_* sub-agent outputs sometimes summarized (v0.8.9 agents: summarize sub-agent output before folding it into the parent's context #511 in flight), sometimes not.

Result: a single haystack tool call can blow the cache prefix and cost more than the actual reasoning work.

Proposed change

  • Default policy: any tool output > N tokens (initial: 4K) routes through the workshop instead of returning raw to the parent.
  • Mechanism: tool result is loaded as a workshop variable (e.g. last_tool_result); a synthesis subagent (V4-Flash, cheap) processes it against the original tool intent; only the synthesis returns to the parent.
  • Override: the model can request raw output via a tool parameter (raw=true) when it knows it needs the full thing in resident context.
  • Promotion: the workshop variable persists; the model can later call promote_to_context(last_tool_result) (per Phase C of the workshop issue) to pull the full thing back if synthesis missed something.
  • Per-tool tunable threshold: some tools are more haystack-shaped than others. read_file threshold higher; grep_files threshold lower.

Cache discipline

This is the cache-protection mechanism. Without it, #541 (cache-maximal context default) is fragile — one bad grep blows the prefix. With it, the parent's cache prefix stays stable; haystack work happens off to the side.

Open questions / risks

  • Threshold tuning: 4K is a starting estimate. Too low → workshop overhead on small results. Too high → cache pollution on medium results.
  • Synthesis quality: Flash-summarized tool output may miss details. Mitigated by promotion path + raw=true opt-out, but worth measuring on real tasks.
  • Latency: workshop synthesis adds an LLM call per large tool output. Cheap (Flash) but real. May want streaming synthesis.
  • Tool authorship: new tools need to declare a default threshold or inherit a sensible one.

Acceptance signals

  • A read_file on a 100KB file returns a synthesis to the parent, with the full content available in the workshop for promotion.
  • grep_files returning 200 hits returns a ranked summary, not raw matches.
  • Cache hit rate on long V4 sessions stays high even when many large tool calls happen.
  • Override via raw=true works and bypasses the workshop.
  • Token cost per session for haystack-heavy work drops measurably (Flash synthesis cheaper than Pro reading raw).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions