Research Finding
Source: Factory.ai evaluation framework (2025), ICLR 2025
Applicability: Medium | Complexity: Simple
Problem
Zeph has no validation after context compaction. If summarization loses a critical fact (file path, decision, API key location), the next LLM call silently operates on incomplete context. This is the root cause of the class of bugs where agents 'forget' state after long sessions.
Proposed Approach
After each summarization event, run a lightweight 'compaction probe':
- Generate 2-3 factual questions from the original turns that are about to be summarized (e.g., 'What file was modified?', 'What was the user's goal?')
- Inject the new summary as context and ask the questions
- Score answers against expected (stored before compaction)
- If probe score < threshold: log WARN with question/answer pairs, optionally fall back to keeping original turns
The probe adds 1 extra LLM call per compaction event (mitigated by using a cheap fast model via the existing orchestrator).
Integration Points
crates/zeph-memory: validate_compaction(before_messages, summary) -> CompactionScore
[memory.compression] config: probe_enabled = false, probe_model (defaults to summary model), probe_threshold = 0.7
- Debug dump: include
compaction_probe section with questions, answers, score
- CLI: no user-facing change (background validation)
Reference
Research Finding
Source: Factory.ai evaluation framework (2025), ICLR 2025
Applicability: Medium | Complexity: Simple
Problem
Zeph has no validation after context compaction. If summarization loses a critical fact (file path, decision, API key location), the next LLM call silently operates on incomplete context. This is the root cause of the class of bugs where agents 'forget' state after long sessions.
Proposed Approach
After each summarization event, run a lightweight 'compaction probe':
The probe adds 1 extra LLM call per compaction event (mitigated by using a cheap fast model via the existing orchestrator).
Integration Points
crates/zeph-memory:validate_compaction(before_messages, summary) -> CompactionScore[memory.compression]config:probe_enabled = false,probe_model(defaults to summary model),probe_threshold = 0.7compaction_probesection with questions, answers, scoreReference