Skip to content

fix(agents): defer bootstrap context-engine maintenance to background#90199

Open
dripsmvcp wants to merge 1 commit into
openclaw:mainfrom
dripsmvcp:fix/67716-bootstrap-deferred-compaction
Open

fix(agents): defer bootstrap context-engine maintenance to background#90199
dripsmvcp wants to merge 1 commit into
openclaw:mainfrom
dripsmvcp:fix/67716-bootstrap-deferred-compaction

Conversation

@dripsmvcp

@dripsmvcp dripsmvcp commented Jun 4, 2026

Copy link
Copy Markdown

Summary

  • Bootstrap/reconcile context-engine maintenance runs in foreground, where deferred compaction debt cannot execute (allowDeferredCompactionExecution is background-only) and no background follow-up is scheduled (only turns defer). So debt created when bootstrap imports tail messages past the leaf trigger is stranded, leaving sessions repeating deferred compaction still needed (issue [Bug]: bootstrap/reconcile and hot-cache policy can leave deferred compaction debt stranded #67716, Case 1).
  • Lets reason="bootstrap" schedule the same background debt consumer turns already use, for engines that opt into background maintenance (turnMaintenanceMode === "background").
  • Foreground bootstrap is unchanged for engines without background maintenance; the plugin-owned (Lossless-Claw) hot-cache sticky-debt path (Case 2) is intentionally out of scope.
  • Reviewers should focus on: deferring bootstrap is safe because turns already defer for these engines, the deferred run dedups/coalesces per session, and it runs in background mode (so it can actually pay the debt).

Linked context

Closes #67716

Related #66820 — deferred-maintenance token budget (a different aspect of the same subsystem; not this scheduling gap).

Not maintainer-requested; selected from the clawsweeper:queueable-fix backlog. ClawSweeper's review of #67716 recommended exactly this OpenClaw-scoped fix: "extend the existing deferred-maintenance lifecycle so bootstrap/reconcile can schedule a valid background debt consumer," keeping plugin hot-cache/dedup policy out.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: bootstrap/reconcile created deferred compaction debt that could not execute because bootstrap maintenance runs foreground (issue [Bug]: bootstrap/reconcile and hot-cache policy can leave deferred compaction debt stranded #67716 Case 1).
  • Real environment tested: Linux / Node 24, exercising the real production runContextEngineMaintenance lifecycle — the deferred background worker, the task registry/queue, and the session lane — not mocked.
  • Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/context-engine-maintenance.test.ts
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): copied terminal output. The added regression fails on unfixed main (bootstrap ran foreground and returned a maintenance result) and passes after the fix (bootstrap defers and the background worker runs):
unfixed main: × defers bootstrap maintenance to a background debt consumer for background-mode engines
              AssertionError: expected { Object (changed, bytesFreed, ...) } to be undefined

with the fix: Test Files  1 passed (1)
              Tests       23 passed (23)
  • Observed result after fix: for a background-mode engine, bootstrap maintenance schedules a background context_engine_turn_maintenance task instead of running inline, and the deferred worker runs maintain() with allowDeferredCompactionExecution: true (the debt can now be paid).
  • What was not tested: a live gateway bootstrap/reconcile overflow on a running OpenClaw deployment.
  • Proof limitations or environment constraints: this is unit/integration-level proof against the real lifecycle functions, which CONTRIBUTING.md treats as supplemental rather than a live-setup capture. I could not drive a live gateway into a bootstrap overflow in this environment (needs a full running OpenClaw + provider + forced overflow; the sandbox network also blocked a clean install and the crabbox check:changed gate). A maintainer with a live setup can confirm end-to-end, or apply proof: override for this logic-scoped scheduling fix.
  • Before evidence (optional but encouraged): the failing assertion above (expected { ... } to be undefined) is the before-state from unfixed main, where bootstrap ran foreground.
How to capture live behavior proof on a real setup (for a maintainer or the reporter)

This needs the background-maintenance context engine (the Lossless-Claw / "LCM" engine from the issue, which sets turnMaintenanceMode: "background" and emits the LCM compaction leaf pass lines) plus a real provider — neither of which I can run in CI/sandbox. Steps:

  1. Build this branch: pnpm install && pnpm build; run the gateway with the LCM-configured agent and tail logs (./scripts/clawlog.sh).
  2. Restart the gateway on a session whose reconcile tail-import pushes rawTokensOutsideTail past the leaf trigger (issue [Bug]: bootstrap/reconcile and hot-cache policy can leave deferred compaction debt stranded #67716 Case 1).
  3. Grep the bootstrap window (redact session keys/content):
grep -E "reconcileSessionTail|deferred turn maintenance (queued|resuming)|deferred compaction (debt pending|skipped|completed)|allowDeferredCompactionExecution|reason=compacted|compactLeafAsync" <gateway-log>

Before this fix the bootstrap window shows deferred compaction debt pending ... allowDeferredCompactionExecution is disableddeferred compaction skipped ... reason=deferred compaction still needed (stranded). After this fix it shows the core line [context-engine] deferred turn maintenance queued ... lane=context-engine-turn-maintenance:<key> during bootstrap (emitted at context-engine-maintenance.ts:619; previously only on turns) followed by compactLeafAsync startLCM compaction leaf passdeferred compaction completed ... reason=compacted (paid). I will paste the redacted excerpt here once it is captured.

Tests and validation

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/context-engine-maintenance.test.ts -> 23 passed (1 new).
  • Acceptance set (from the issue review, paths updated for the embedded-agent-runner rename): attempt.spawn-workspace.context-engine.test.ts -> 56 passed; extensions/codex/src/app-server/run-attempt.context-engine.test.ts -> 24 passed; src/agents/harness/context-engine-lifecycle.test.ts -> 9 passed.
  • tsgo -p tsconfig.core.json and core test types: no errors in the changed files (two unrelated pre-existing errors in src/config/io.ts and src/secrets/config-io.ts are present on main).
  • Regression coverage added: "defers bootstrap maintenance to a background debt consumer for background-mode engines" (fails first, passes after).

Risk checklist

  • Did user-visible behavior change? No — internal startup maintenance scheduling.
  • Did config, environment, or migration behavior change? No.
  • Did security, auth, secrets, network, or tool execution behavior change? No.
  • Highest-risk area: deferring bootstrap maintenance to background (session-state).
  • How is that risk mitigated? It only defers for engines that already opt into background turn maintenance (foreground is unchanged for all others); it reuses the existing per-session, dedup/coalesced deferred-maintenance lane that turns use; and it is covered by the new regression plus the existing 22 maintenance tests and the spawn-workspace / codex / lifecycle suites.

Current review state

  • Next action: maintainer / ClawSweeper review. The code itself is accepted by ClawSweeper ("no narrow code repair is needed"); the only open item is the real-behavior-proof gate (status: 📣 needs proof).
  • Scope: this addresses the OpenClaw-owned Case 1 only. Case 2 (hot-cache sticky debt / duplicate ledger) is plugin-owned (Lossless-Claw) per ClawSweeper's review and is intentionally excluded; the broader duplicate-transcript umbrella can stay open if maintainers prefer.
  • On proof: the change is verified at the unit/integration level against the real lifecycle functions, but the live-overflow capture requires the external Lossless-Claw engine + a provider, which I can't run here. The exact live-capture steps are in the "How to capture live behavior proof" block above; I will add the redacted excerpt as soon as anyone with a live LCM setup runs it. Requesting a maintainer proof: override for this logic-scoped core scheduling fix in the meantime — ClawSweeper offered that path for exactly this case.

Bootstrap/reconcile context-engine maintenance runs foreground, where
deferred compaction debt cannot execute (allowDeferredCompactionExecution is
background-only) and no background follow-up is scheduled — only turns defer.
So debt created when bootstrap imports tail messages past the leaf trigger is
stranded, leaving sessions repeating "deferred compaction still needed"
(issue openclaw#67716, Case 1).

Extend the deferred-maintenance gate so reason="bootstrap" also schedules the
existing background debt consumer for engines that opt into background
maintenance (turnMaintenanceMode === "background"). Foreground bootstrap is
unchanged for engines without background maintenance, and the plugin-owned
hot-cache sticky-debt path (Case 2) is intentionally left out of scope.

Closes openclaw#67716
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 4, 2026
@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 4, 2026, 8:00 AM ET / 12:00 UTC.

Summary
This PR changes runContextEngineMaintenance so background-mode engines defer bootstrap maintenance into the existing background maintenance worker and adds a regression test for that scheduling path.

PR surface: Source +4, Tests +61. Total +65 across 2 files.

Reproducibility: yes. from source: current main runs bootstrap maintenance in foreground while only background execution sets allowDeferredCompactionExecution, and bootstrap is not currently eligible for the deferred background worker. I did not reproduce the full live LCM overflow; the remaining proof gap is live behavior, not source traceability.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted live gateway/bootstrap logs or a linked artifact showing bootstrap-created debt queued and consumed after the patch, or obtain an explicit maintainer proof override.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides copied Vitest output only; before merge it needs redacted live logs/terminal output or a linked artifact showing the real bootstrap/reconcile background-maintenance path, or an explicit proof override. Redact private details before posting, and updating the PR body should trigger re-review; if not, ask a maintainer to comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] The supplied after-fix proof is copied Vitest output, not a live OpenClaw bootstrap/reconcile overflow or redacted runtime log showing bootstrap-created debt being consumed.
  • [P1] Merging changes session-state timing for background-mode engines: bootstrap maintenance that used to run inline now runs on the deferred session-lane consumer.

Maintainer options:

  1. Require live bootstrap proof (recommended)
    Ask for redacted live logs, terminal output, or a linked artifact showing bootstrap-created deferred debt queued and consumed after this patch before merge.
  2. Accept a maintainer proof override
    A maintainer with enough subsystem confidence can explicitly override the proof gate and own the session-state timing risk for this small scheduling change.
  3. Pause until LCM can be exercised
    If no one can run the background-maintenance engine in a real setup, keep the PR open rather than merging a session-state timing change on test proof alone.

Next step before merge

  • [P1] Manual review remains because the only blocker is contributor live proof or a maintainer proof override; I found no narrow code repair for ClawSweeper to apply.

Security
Cleared: The diff only changes internal TypeScript scheduling logic and a focused Vitest test; it does not touch dependencies, CI, secrets, permissions, network calls, or package metadata.

Review details

Best possible solution:

Land the narrow bootstrap deferral only after redacted live bootstrap/reconcile logs or an explicit maintainer proof override confirms the background worker consumes the stranded debt; keep the plugin-owned hot-cache policy case out of this PR.

Do we have a high-confidence way to reproduce the issue?

Yes from source: current main runs bootstrap maintenance in foreground while only background execution sets allowDeferredCompactionExecution, and bootstrap is not currently eligible for the deferred background worker. I did not reproduce the full live LCM overflow; the remaining proof gap is live behavior, not source traceability.

Is this the best way to solve the issue?

Yes for the code shape: reusing the existing deferred background maintenance lane for engines that already opt into background maintenance is the narrowest core fix I found. The merge-readiness gap is proof of the real bootstrap/reconcile scenario, not an alternate code repair.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5ab430fa11ee.

Label changes

Label justifications:

  • P1: The linked bug affects session context maintenance in real agent runs and can leave deferred compaction debt stranded across turns.
  • merge-risk: 🚨 session-state: The PR intentionally changes when bootstrap maintenance side effects occur for background-mode context engines, which can affect transcript/session-state ordering.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides copied Vitest output only; before merge it needs redacted live logs/terminal output or a linked artifact showing the real bootstrap/reconcile background-maintenance path, or an explicit proof override. Redact private details before posting, and updating the PR body should trigger re-review; if not, ask a maintainer to comment @clawsweeper re-review. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +4, Tests +61. Total +65 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 5 1 +4
Tests 1 61 0 +61
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 66 1 +65

What I checked:

  • Repository policy read: Root AGENTS.md and scoped agent runner guidance were read; agent/session-state review policy required checking callers, callees, sibling maintenance behavior, tests, and history before verdict. (AGENTS.md:1, 5ab430fa11ee)
  • Current main source behavior: Current main only defers context-engine maintenance when params.reason === "turn"; executeContextEngineMaintenance only sets allowDeferredCompactionExecution for background execution, so foreground bootstrap maintenance cannot consume deferred compaction debt. (src/agents/embedded-agent-runner/context-engine-maintenance.ts:713, 5ab430fa11ee)
  • Bootstrap caller path: The harness bootstrap path calls maintenance with reason: "bootstrap" after engine bootstrap, and the embedded attempt caller passes session manager, runtime context, config, and agent id into the same maintenance helper. (src/agents/harness/context-engine-lifecycle.ts:47, 5ab430fa11ee)
  • Existing background worker behavior: The reusable deferred worker dedups by session key, waits for the session lane to go idle, and invokes maintenance with executionMode: "background", which is the mode that grants deferred compaction execution. (src/agents/embedded-agent-runner/context-engine-maintenance.ts:481, 5ab430fa11ee)
  • Context-engine contract: The type contract says maintenance runs after bootstrap, turns, or compaction; turnMaintenanceMode is an opt-in background mode, and allowDeferredCompactionExecution is the host flag engines use to consume deferred compaction debt. (src/context-engine/types.ts:108, 5ab430fa11ee)
  • PR implementation: The PR adds bootstrap to the same defer gate used by turns and adds a regression asserting bootstrap maintenance queues a task and later calls maintain() with allowDeferredCompactionExecution: true. (src/agents/embedded-agent-runner/context-engine-maintenance.ts:718, b2dadefb5336)

Likely related people:

  • EVA: Authored the squash commit that introduced idle-aware background context-engine turn maintenance and the underlying background-maintenance contract this PR extends. (role: introduced behavior; confidence: high; commits: c15b295a8564; files: src/context-engine/types.ts, src/agents/pi-embedded-runner/context-engine-maintenance.ts)
  • @jalehman: Reviewed the original background-maintenance PR and authored the later deferred-maintenance token-budget fix in the same subsystem. (role: reviewer and recent adjacent owner; confidence: high; commits: c15b295a8564, 75e7fc97f804; files: src/agents/pi-embedded-runner/run/attempt.ts, src/context-engine/types.ts)
  • Peter Steinberger: Current-main blame/log show recent broad maintenance of the renamed embedded context-engine maintenance and harness files that now own this code path. (role: recent area contributor; confidence: medium; commits: 045145c70082, 5ab430fa11ee; files: src/agents/embedded-agent-runner/context-engine-maintenance.ts, src/agents/harness/context-engine-lifecycle.ts, src/context-engine/types.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 4, 2026
@dripsmvcp

Copy link
Copy Markdown
Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: bootstrap/reconcile and hot-cache policy can leave deferred compaction debt stranded

1 participant