Context
LangChain's Autonomous Context Compression proposes exposing compaction as an agent tool rather than triggering it at a fixed token threshold. The agent decides when to compact at semantically meaningful moments (task boundaries, before large new inputs, after extracting results).
Why This Matters
SynthOrg's compaction is currently threshold-based. Our design spec already mandates "task boundary only, never mid-execution" for auto-downgrade -- the same principle should apply to compaction. Agent-controlled compaction is architecturally cleaner and avoids mid-reasoning interruptions.
Action Items
References
Additional Research (2026-03-26)
Epistemic Marker Preservation
Source: Self-Distillation & Epistemic Verbalization (arXiv:2603.24472)
Compaction MUST preserve epistemic markers ("wait", "hmm", "actually", "perhaps", "alternatively", "check") -- these are functionally important for out-of-distribution reasoning. Removing them degraded AIME24 by up to 63%. Current engine/compaction/summarizer.py uses simple text concatenation with no marker detection -- this is a concrete gap.
Rule: Narrow/repetitive tasks can use concise reasoning; diverse/novel tasks need full uncertainty-aware style preserved.
Surprisal-Based Semantic Token Cost
Source: Reasoning as Compression / CIB (arXiv:2603.08462) (ICML 2025)
Use surprisal under a frozen base model (instead of flat length penalties) to assign per-token compression cost. High-surprisal tokens carry essential reasoning; low-surprisal tokens are predictable filler. Results: 41% token reduction with <1.5% accuracy drop. The beta parameter provides a smooth control knob on the accuracy-efficiency Pareto frontier -- maps directly to quota degradation under budget pressure.
Concrete Compression Thresholds
Source: Deep Agents Context Engineering
Reference thresholds from LangChain Deep Agents:
- 20,000 tokens: triggers offloading (large tool results replaced with file path + 10-line preview)
- 85% of model max_input_tokens: triggers summarization (in-context LLM summary of session intent + artifacts + next steps)
- 10% of recent tokens: always retained verbatim
- Catches
ContextOverflowError and retries with summary
Compare against current CompactionConfig.fill_threshold_percent=80.0 and preserve_recent_turns=3.
Context
LangChain's Autonomous Context Compression proposes exposing compaction as an agent tool rather than triggering it at a fixed token threshold. The agent decides when to compact at semantically meaningful moments (task boundaries, before large new inputs, after extracting results).
Why This Matters
SynthOrg's compaction is currently threshold-based. Our design spec already mandates "task boundary only, never mid-execution" for auto-downgrade -- the same principle should apply to compaction. Agent-controlled compaction is architecturally cleaner and avoids mid-reasoning interruptions.
Action Items
engine/compress_contexttool in the hybrid loopReferences
Additional Research (2026-03-26)
Epistemic Marker Preservation
Source: Self-Distillation & Epistemic Verbalization (arXiv:2603.24472)
Compaction MUST preserve epistemic markers ("wait", "hmm", "actually", "perhaps", "alternatively", "check") -- these are functionally important for out-of-distribution reasoning. Removing them degraded AIME24 by up to 63%. Current
engine/compaction/summarizer.pyuses simple text concatenation with no marker detection -- this is a concrete gap.Rule: Narrow/repetitive tasks can use concise reasoning; diverse/novel tasks need full uncertainty-aware style preserved.
Surprisal-Based Semantic Token Cost
Source: Reasoning as Compression / CIB (arXiv:2603.08462) (ICML 2025)
Use surprisal under a frozen base model (instead of flat length penalties) to assign per-token compression cost. High-surprisal tokens carry essential reasoning; low-surprisal tokens are predictable filler. Results: 41% token reduction with <1.5% accuracy drop. The beta parameter provides a smooth control knob on the accuracy-efficiency Pareto frontier -- maps directly to quota degradation under budget pressure.
Concrete Compression Thresholds
Source: Deep Agents Context Engineering
Reference thresholds from LangChain Deep Agents:
ContextOverflowErrorand retries with summaryCompare against current
CompactionConfig.fill_threshold_percent=80.0andpreserve_recent_turns=3.