Skip to content

research: evaluate agent-controlled compaction for engine hybrid loop #687

@Aureliolo

Description

@Aureliolo

Context

LangChain's Autonomous Context Compression proposes exposing compaction as an agent tool rather than triggering it at a fixed token threshold. The agent decides when to compact at semantically meaningful moments (task boundaries, before large new inputs, after extracting results).

Why This Matters

SynthOrg's compaction is currently threshold-based. Our design spec already mandates "task boundary only, never mid-execution" for auto-downgrade -- the same principle should apply to compaction. Agent-controlled compaction is architecturally cleaner and avoids mid-reasoning interruptions.

Action Items

  • Review current compaction trigger mechanism in engine/
  • Evaluate exposing a compress_context tool in the hybrid loop
  • Design how agent-triggered compaction interacts with context budget system
  • Consider fallback: system-triggered compaction if agent never invokes tool (safety net)

References


Additional Research (2026-03-26)

Epistemic Marker Preservation

Source: Self-Distillation & Epistemic Verbalization (arXiv:2603.24472)

Compaction MUST preserve epistemic markers ("wait", "hmm", "actually", "perhaps", "alternatively", "check") -- these are functionally important for out-of-distribution reasoning. Removing them degraded AIME24 by up to 63%. Current engine/compaction/summarizer.py uses simple text concatenation with no marker detection -- this is a concrete gap.

Rule: Narrow/repetitive tasks can use concise reasoning; diverse/novel tasks need full uncertainty-aware style preserved.

Surprisal-Based Semantic Token Cost

Source: Reasoning as Compression / CIB (arXiv:2603.08462) (ICML 2025)

Use surprisal under a frozen base model (instead of flat length penalties) to assign per-token compression cost. High-surprisal tokens carry essential reasoning; low-surprisal tokens are predictable filler. Results: 41% token reduction with <1.5% accuracy drop. The beta parameter provides a smooth control knob on the accuracy-efficiency Pareto frontier -- maps directly to quota degradation under budget pressure.

Concrete Compression Thresholds

Source: Deep Agents Context Engineering

Reference thresholds from LangChain Deep Agents:

  • 20,000 tokens: triggers offloading (large tool results replaced with file path + 10-line preview)
  • 85% of model max_input_tokens: triggers summarization (in-context LLM summary of session intent + artifacts + next steps)
  • 10% of recent tokens: always retained verbatim
  • Catches ContextOverflowError and retries with summary

Compare against current CompactionConfig.fill_threshold_percent=80.0 and preserve_recent_turns=3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    prio:lowNice to have, can deferscope:medium1-3 days of workspec:task-workflowDESIGN_SPEC Section 6 - Task & Workflow Enginetype:researchEvaluate options, make tech decisionsv0.7Minor version v0.7v0.7.5Patch release v0.7.5

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions