Skip to content

feat: compaction circuit breaker with auth failure detection#230

Merged
jalehman merged 4 commits into
Martian-Engineering:mainfrom
liu51115:feat/compaction-circuit-breaker
Apr 1, 2026
Merged

feat: compaction circuit breaker with auth failure detection#230
jalehman merged 4 commits into
Martian-Engineering:mainfrom
liu51115:feat/compaction-circuit-breaker

Conversation

@liu51115

@liu51115 liu51115 commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a circuit breaker to prevent compaction retry storms when the LLM provider returns auth errors (e.g. expired API key, revoked token). Without this, a single auth failure can trigger an unbounded cascade of compaction retries that all fail identically.

Key changes

  • Circuit breaker gates compaction after N consecutive auth failures (configurable threshold, default 5, 30-minute cooldown)
  • requireStructuralSignal option on extractProviderAuthFailure() — only HTTP 401 status codes or error.kind === "provider_auth" count as auth failures, preventing false positives from LLM summary text that happens to mention auth errors
  • All 4 call sites in summarize.ts patched — 3 use requireStructuralSignal: true (parsing LLM output where false positives are possible), 1 intentionally omits it (handling real thrown exceptions where structural signals are always present)
  • 6 new tests covering circuit breaker behavior and false-positive prevention

Test plan

  • 6 new unit tests pass
  • Verify circuit breaker trips after configured threshold of consecutive auth failures
  • Verify circuit breaker resets after cooldown period
  • Verify requireStructuralSignal: true rejects false positives from LLM text mentioning auth errors
  • Verify real auth exceptions (without requireStructuralSignal) still trip the breaker

🤖 Generated with Claude Code

Claw Liu and others added 3 commits April 1, 2026 19:10
…orms

- Track consecutive auth failures in LcmContextEngine
- Trip after configurable threshold (default 5) consecutive failures
- Auto-reset after configurable cooldown (default 30 min)
- Reset on successful compaction
- Gate both compact() and compactLeafAsync() entry points
- Propagate authFailure signal through CompactionResult and compactUntilUnder
- Config: circuitBreakerThreshold, circuitBreakerCooldownMs (env + plugin config)
- Fix: remove duplicate summaryModel/summaryProvider fields in LcmConfig type
- Add 5 behavioral tests (trip, block, leaf-block, auto-reset, success-reset)
…Auth

The false-positive circuit breaker fix (fcaccfc) only applied
requireStructuralSignal to the primary success path in
attemptSummarizerCall. The retry path in retryWithoutModelAuth still
called extractProviderAuthFailure() without it, so valid summaries
containing auth-error phrases (e.g. summaries of sessions discussing
auth fixes) would false-trigger the circuit breaker on retry.

Apply requireStructuralSignal: true to both the success and catch
paths inside retryWithoutModelAuth (lines ~876 and ~896 in source).
The primary catch path (line ~994 in dist) correctly remains unguarded
since thrown exceptions carry structural signals (HTTP 401, error.kind).

460 tests pass. Audited all 4 call sites.
@jalehman jalehman self-assigned this Apr 1, 2026
Scope compaction auth circuit-breaker state to the resolved summarizer identity instead of the shared engine instance, propagate auth failures out of full-sweep compaction even after partial progress, and add regression coverage plus a patch changeset.

Regeneration-Prompt: |
  User asked to implement the review findings directly on PR Martian-Engineering#230 in the existing
  review worktree, then add a changeset and push back to the contributor branch.
  The two concrete bugs were: the compaction auth circuit breaker lived on the
  singleton LcmContextEngine even though summarizer/provider selection is
  resolved per request, and compactFullSweep hid provider auth failures when an
  earlier sweep pass had already made some progress.

  Preserve the feature and keep the change additive. Key the breaker by the
  effective summarizer identity used for compaction so one broken provider or
  auth profile does not block unrelated sessions, providers, or models. For
  custom injected summarize functions, keep breaker scope local to the session
  queue. Make full-sweep compaction surface authFailure even after partial
  compaction so callers do not treat it as a clean success and reset the
  breaker. Update engine result handling and add focused regression tests for
  provider scoping and the partial-progress full-sweep path. Add a patch
  changeset because this changes runtime compaction behavior.
@jalehman jalehman merged commit ca51445 into Martian-Engineering:main Apr 1, 2026
1 check passed
@jalehman

jalehman commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants