Description
arXiv:2604.19572 "A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression" (April 21 2026) introduces TACO: a plug-and-play framework that automatically discovers and refines rules for compressing terminal/tool output, addressing the quadratic token growth problem in long-horizon agentic tasks.
Key mechanism:
- Global Rule Pool: reusable compression rules accumulated across tasks
- Task-specific adaptation: online refinement during the current session
- Rule format: each rule is a function that decides whether/how to compress a specific tool output pattern
Results: 1-4% accuracy gain on TerminalBench, ~10% token reduction.
How It Applies to Zeph
Zeph's ShellExecutor and tool audit log accumulate raw outputs that are injected verbatim into context. Long shell commands (cargo build, cargo test) produce hundreds of lines that are mostly noise. TACO's pattern is directly applicable:
- Compression rule registry: store rules like "cargo build output: keep only errors and warnings lines" or "cargo test output: keep only failed test names + count"
- Self-evolution: after each session, analyze which output portions were actually referenced in LLM decisions → refine rules
- Integration point: ToolExecutor trait → post-process output before context injection
Implementation Sketch
- Add trait to with a default identity implementation
- Implement rule-based compressor backed by a SQLite rules table
- Add self-evolution step in session cleanup: diff what was referenced vs full output, generate candidate rules via LLM
- Expose config section with , , fields
Complexity
Medium-low. No ML training required — rule discovery is LLM-prompted from session traces.
Source
Description
arXiv:2604.19572 "A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression" (April 21 2026) introduces TACO: a plug-and-play framework that automatically discovers and refines rules for compressing terminal/tool output, addressing the quadratic token growth problem in long-horizon agentic tasks.
Key mechanism:
Results: 1-4% accuracy gain on TerminalBench, ~10% token reduction.
How It Applies to Zeph
Zeph's ShellExecutor and tool audit log accumulate raw outputs that are injected verbatim into context. Long shell commands (cargo build, cargo test) produce hundreds of lines that are mostly noise. TACO's pattern is directly applicable:
Implementation Sketch
Complexity
Medium-low. No ML training required — rule discovery is LLM-prompted from session traces.
Source