Skip to content

Compaction failure leaves session in permanent failed state with no automatic recovery #69202

@littlebest

Description

@littlebest

Bug: Compaction failure leaves session in permanent failed state with no automatic recovery

Summary

When session compaction fails (for any reason: timeout, API error, quota exceeded, model not supporting reasoning, etc.), the session enters status=failed permanently. There is no watchdog or automatic recovery mechanism — the channel becomes completely unresponsive (已读不回), and the only workaround is manual intervention.

Steps to reproduce

  1. Have a session that grows large enough to trigger compaction (~500KB+ JSONL)
  2. Compaction fails (e.g., model returns error, or times out)
  3. Session status becomes "failed"
  4. All subsequent messages to that channel receive no response
  5. No automatic escalation, no fallback session, no user notification

Observed log

Session e1bc0eb2 at 04:11-04:14:

  • Compaction triggered on large session
  • Fallback model returned 400 (reasoning required but disabled in system prompt)
  • Multiple retry attempts → 429 rate limit
  • Session grew even larger from error messages
  • Compaction failed → status=failed → no further responses
  • Auto-recovery via external heartbeat watchdog was the only way out

Expected behavior

Compaction failure should have automatic escalation:

  1. Try alternative compaction model
  2. If all models fail, create a new session automatically and notify the user
  3. Never leave a session permanently dead with no response

Impact

  • High: Users experience '已读不回' with no explanation
  • Data loss feel: users don't know history was preserved in .bak.recovered.* files
  • Depends on external heartbeat watchdog to recover — not a proper fix

References

Suggested fix

Add a compaction-failure watchdog in the gateway that:

  1. Detects when compaction has failed
  2. Automatically creates a new session for the channel
  3. Optionally preserves a backup of the old session file
  4. Notifies the user that a new session was started

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions