Feature Idea: Preserve-on-Failure Principle for Context Compression
Problem
When context compression fails (LLM unavailable, summary generation error, network timeout, etc.), the system currently loses conversation history — the middle turns are deleted and replaced with an error marker. This is a destructive failure mode.
Proposed Principle: Preserve-on-Failure
Any compression subsystem should implement "preserve-on-failure": when compression cannot complete, keep the original data intact rather than deleting it. The contract is:
"Compression is an optimization. If the optimization fails, the original remains unchanged."
Concrete Changes Needed
-
_generate_summary() returning None: Return all original messages unchanged immediately — do not delete anything.
-
_generate_summary() returning empty/fallback string: Keep original middle messages as-is, do not replace them with a static error message.
-
Context Engine plugin ABC (agent/context_engine.py): The compress() signature should document that it either returns a compressed list OR the original list if compression fails — callers must handle both cases.
-
Compression counter: Increment compression_count even on failure (for observability), but log clearly that it was a failure.
-
User notification: The warning message should clearly state "all X original messages preserved" so operators know the safety net worked.
Why This Matters
- Long conversations are high-value: The more messages in a session, the more valuable the context, and the more painful its loss.
- Failure is often transient: Network blips or API rate limits cause brief failures that could succeed on retry — but the current code commits data loss immediately.
- User trust: Users who notice missing context assume the system is broken, even if only compression failed.
Implementation Sketch
# In compress():
summary = self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# PRESERVE-ON-FAILURE: explicit None from _generate_summary
if summary is None:
logger.warning("Compression unavailable — preserving all %d original messages", n_messages)
return messages # <— nothing deleted
if not summary: # fallback was generated
logger.warning("Summary empty — preserving original middle %d messages", compress_end - compress_start)
for i in range(compress_start, compress_end):
compressed.append(messages[i].copy())
# ... tail + sanitization
return compressed
Labels
enhancement, context-engine, reliability, user-experience
Feature Idea: Preserve-on-Failure Principle for Context Compression
Problem
When context compression fails (LLM unavailable, summary generation error, network timeout, etc.), the system currently loses conversation history — the middle turns are deleted and replaced with an error marker. This is a destructive failure mode.
Proposed Principle: Preserve-on-Failure
Any compression subsystem should implement "preserve-on-failure": when compression cannot complete, keep the original data intact rather than deleting it. The contract is:
Concrete Changes Needed
_generate_summary()returningNone: Return all original messages unchanged immediately — do not delete anything._generate_summary()returning empty/fallback string: Keep original middle messages as-is, do not replace them with a static error message.Context Engine plugin ABC (
agent/context_engine.py): Thecompress()signature should document that it either returns a compressed list OR the original list if compression fails — callers must handle both cases.Compression counter: Increment
compression_counteven on failure (for observability), but log clearly that it was a failure.User notification: The warning message should clearly state "all X original messages preserved" so operators know the safety net worked.
Why This Matters
Implementation Sketch
Labels
enhancement, context-engine, reliability, user-experience