research: task-continuation metric for post-compaction validation

## Research Finding

**Source**: Factory.ai evaluation framework (2025), ICLR 2025
**Applicability**: Medium | **Complexity**: Simple

## Problem

Zeph has no validation after context compaction. If summarization loses a critical fact (file path, decision, API key location), the next LLM call silently operates on incomplete context. This is the root cause of the class of bugs where agents 'forget' state after long sessions.

## Proposed Approach

After each summarization event, run a lightweight 'compaction probe':

1. Generate 2-3 factual questions from the original turns that are about to be summarized (e.g., 'What file was modified?', 'What was the user's goal?')
2. Inject the new summary as context and ask the questions
3. Score answers against expected (stored before compaction)
4. If probe score < threshold: log WARN with question/answer pairs, optionally fall back to keeping original turns

The probe adds 1 extra LLM call per compaction event (mitigated by using a cheap fast model via the existing orchestrator).

## Integration Points

- `crates/zeph-memory`: `validate_compaction(before_messages, summary) -> CompactionScore`  
- `[memory.compression]` config: `probe_enabled = false`, `probe_model` (defaults to summary model), `probe_threshold = 0.7`
- Debug dump: include `compaction_probe` section with questions, answers, score
- CLI: no user-facing change (background validation)

## Reference

- Factory.ai: https://factory.ai/news/evaluating-compression
- ICLR 2025 cascade routing: https://arxiv.org/abs/2410.10347

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research: task-continuation metric for post-compaction validation #1609

Research Finding

Problem

Proposed Approach

Integration Points

Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

research: task-continuation metric for post-compaction validation #1609

Description

Research Finding

Problem

Proposed Approach

Integration Points

Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions