Background
JetBrains Research (Dec 2025) studied context management in production AI coding agents and found:
LLM summarization caused agents to run 13-15% longer on average because summaries may 'smooth over' failure signals that normally trigger stopping conditions, ironically increasing total costs.
Observation masking (replacing old tool results with placeholders) cut costs >50% vs. unmanaged context while outperforming summarization in terms of both task completion and total token usage.
Source: Efficient Context Management for AI Agents (JetBrains Research, Dec 2025)
Problem
Zeph has no instrumentation to measure whether hard compaction (LLM summarization) causes trajectory elongation — i.e., does the agent take more turns / make more tool calls after a hard compaction event than it would have otherwise?
Proposal
Add metrics to track per-session:
compaction_hard_count: number of hard compaction events in session
turns_after_hard_compaction: tool iterations used after each hard compaction
- Log these at session end via
tracing::info! in agent_shutdown
This would allow detecting the trajectory elongation pattern in real sessions (visible in .local/testing/debug/session.log).
Secondary: if elongation is confirmed, consider increasing soft compaction frequency (lower soft_compaction_threshold) to reduce reliance on hard compaction.
Applicability
- Impact: Medium — enables data-driven tuning of compaction thresholds
- Complexity: Low — counter fields in
AgentMetrics, log at shutdown
- Risk: None — metrics-only change
References
Background
JetBrains Research (Dec 2025) studied context management in production AI coding agents and found:
Observation masking (replacing old tool results with placeholders) cut costs >50% vs. unmanaged context while outperforming summarization in terms of both task completion and total token usage.
Source: Efficient Context Management for AI Agents (JetBrains Research, Dec 2025)
Problem
Zeph has no instrumentation to measure whether hard compaction (LLM summarization) causes trajectory elongation — i.e., does the agent take more turns / make more tool calls after a hard compaction event than it would have otherwise?
Proposal
Add metrics to track per-session:
compaction_hard_count: number of hard compaction events in sessionturns_after_hard_compaction: tool iterations used after each hard compactiontracing::info!inagent_shutdownThis would allow detecting the trajectory elongation pattern in real sessions (visible in
.local/testing/debug/session.log).Secondary: if elongation is confirmed, consider increasing soft compaction frequency (lower
soft_compaction_threshold) to reduce reliance on hard compaction.Applicability
AgentMetrics, log at shutdownReferences