Skip to content

fix: move proactive threshold compaction off the reply path#363

Closed
100yenadmin wants to merge 2 commits intoMartian-Engineering:mainfrom
electricsheephq:codex/defer-threshold-compaction
Closed

fix: move proactive threshold compaction off the reply path#363
100yenadmin wants to merge 2 commits intoMartian-Engineering:mainfrom
electricsheephq:codex/defer-threshold-compaction

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

Summary

This addresses #313 by moving proactive threshold compaction out of the synchronous afterTurn() reply path.

Right now reply generation can already be done, but afterTurn() still awaits the proactive threshold sweep before it returns. On conversations where compaction is slow or frequent, that turns compaction latency into reply latency.

What changed

  • keep ingest and compaction telemetry in afterTurn()
  • keep best-effort incremental leaf compaction behavior unchanged
  • stop awaiting proactive threshold compaction inline
  • schedule threshold compaction onto a tracked background slot keyed by session
  • coalesce duplicate background threshold requests so repeated turns do not pile up redundant sweeps
  • keep true overflow recovery synchronous and unchanged; this only changes the post-turn maintenance path

Why this is safe

The proactive threshold sweep operates on persisted conversation state. It does not need to block reply delivery to preserve correctness. The next turn will still see whatever compaction accomplished because the actual mutation still runs through the existing per-session queue.

Tests

  • pnpm exec vitest run test/engine.test.ts test/config.test.ts test/session-operation-queues.test.ts test/circuit-breaker.test.ts test/expansion.test.ts

Adds focused coverage for:

  • afterTurn() resolving before proactive threshold compaction finishes
  • duplicate background threshold requests coalescing per session
  • background threshold failures not surfacing to the user path
  • existing force/overflow recovery behavior remaining intact

Stack note

This branch is cut on top of #362. Until #362 merges, GitHub will show the lower branch commits here too.

Closes #313.

Copilot AI review requested due to automatic review settings April 10, 2026 13:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces user-visible reply latency by decoupling proactive threshold compaction from the synchronous afterTurn() lifecycle, while also carrying forward the cache-aware compaction hysteresis improvements from the stacked branch (#362).

Changes:

  • Move proactive threshold compaction off the afterTurn() reply path by scheduling it as a coalesced, per-session background task.
  • Extend compaction telemetry and policy with a cold observation streak (consecutiveColdObservations) and a new config knob cacheAwareCompaction.coldCacheObservationThreshold.
  • Update DB migrations, plugin config schema/UI hints, docs, and tests to cover the new behavior and settings.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/engine.ts Schedules threshold compaction in the background (coalesced per session) and adds cold-cache observation threshold logic.
src/store/compaction-telemetry-store.ts Persists/reads consecutiveColdObservations in compaction telemetry.
src/db/migration.ts Adds the new telemetry column for existing DBs and includes it in new schema.
src/db/config.ts Resolves cacheAwareCompaction.coldCacheObservationThreshold from env/config with validation/defaulting.
openclaw.plugin.json Exposes the new config option in UI hints and JSON schema.
docs/configuration.md Documents the new config option and env var.
skills/lossless-claw/references/config.md Adds reference documentation for the new cache-aware setting.
test/engine.test.ts Adds coverage for background threshold compaction timing, coalescing, and failure isolation.
test/config.test.ts Verifies config resolution and schema includes the new option.
test/session-operation-queues.test.ts Updates test config defaults for new cache-aware field.
test/expansion.test.ts Updates test config defaults for new cache-aware field.
test/circuit-breaker.test.ts Updates test config defaults for new cache-aware field.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/engine.test.ts
Comment on lines +5198 to +5201
const winner = await Promise.race([
afterTurnPromise.then(() => "afterTurn-finished"),
new Promise<string>((resolve) => setTimeout(() => resolve("timeout"), 50)),
]);
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses a hard 50ms wall-clock timeout to assert afterTurn() resolves before the background compaction finishes. On slower CI runners (or under load) afterTurn() may legitimately take >50ms due to DB/migration work, making this assertion flaky. Consider using fake timers or a more deterministic assertion (e.g., vi.waitFor with a larger timeout, or asserting afterTurnPromise resolves while compactPromise is still pending and then cleaning up).

Copilot uses AI. Check for mistakes.
@jalehman
Copy link
Copy Markdown
Contributor

Closing as obsolete. The original concern was valid, but current main already moved deferred proactive compaction onto the persisted maintenance-debt path, so this PR’s in-memory background scheduler is no longer the right shape for the codebase. Keeping this open would reintroduce an older approach, and this branch also has a coalescing bug that can skip newer turns.

@jalehman jalehman closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compaction blocks reply delivery — should run async after agent_end

3 participants