Skip to content

[codex] isolate heartbeat context-engine session keys#84248

Open
NianJiuZst wants to merge 4 commits into
openclaw:mainfrom
NianJiuZst:codex/fix-84218-heartbeat-context
Open

[codex] isolate heartbeat context-engine session keys#84248
NianJiuZst wants to merge 4 commits into
openclaw:mainfrom
NianJiuZst:codex/fix-84218-heartbeat-context

Conversation

@NianJiuZst

@NianJiuZst NianJiuZst commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: isolated heartbeat runs kept a stable :heartbeat routing key everywhere, so context-engine lifecycle hooks could still associate later ticks with prior heartbeat state.
  • Solution: preserve the stable routing SessionKey, but forward a fresh per-tick contextEngineSessionKey only through context-engine bootstrap, assemble, ingest, and compaction paths.
  • What changed: heartbeat isolated runs now mint :heartbeat-run:<sessionId> keys for context-engine state, and the embedded runner/auto-reply flow threads that override through lifecycle and recovery paths with regression coverage.
  • What did NOT change (scope boundary): normal session routing, wake re-entry convergence on the stable :heartbeat key, and non-heartbeat runs still default to the existing sessionKey behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Motivation

  • Isolated heartbeat is already documented as starting from fresh context, so replaying accumulated heartbeat context into later ticks can grow prompts until the heartbeat loop compacts and restarts instead of doing scheduled work.

Real behavior proof (required for external PRs)

Behavior addressed: isolatedSession=true heartbeat runs should keep the stable delivery SessionKey while using a fresh context-engine-only session identity on each isolated tick.
Real environment tested: Local OpenClaw source checkout in a Codex worktree, using a temporary workspace and session store, with the production runHeartbeatOnce() path invoked twice under isolatedSession=true and heartbeat.target=none to avoid outbound delivery.
Exact steps or command run after this patch:

node --import tsx --input-type=module <<'EOF'
import fs from 'node:fs/promises';
import os from 'node:os';
import path from 'node:path';
import { runHeartbeatOnce } from './src/infra/heartbeat-runner.ts';
import { resolveMainSessionKey } from './src/config/sessions.ts';

const tmpDir = await fs.mkdtemp(path.join(os.tmpdir(), 'openclaw-hb-proof-'));
const storePath = path.join(tmpDir, 'sessions.json');
await fs.writeFile(path.join(tmpDir, 'HEARTBEAT.md'), '- Check status\n', 'utf8');
const cfg = {
  agents: { defaults: { workspace: tmpDir, heartbeat: { every: '5m', target: 'none', isolatedSession: true } } },
  session: { store: storePath },
};
const sessionKey = resolveMainSessionKey(cfg);
await fs.writeFile(storePath, JSON.stringify({
  [sessionKey]: { sessionId: 'seed-session', updatedAt: Date.now(), lastChannel: 'none', lastProvider: 'heartbeat', lastTo: 'self' }
}), 'utf8');
const observed = [];
const getReplyFromConfig = async (ctx, opts) => {
  observed.push({ SessionKey: ctx.SessionKey, contextEngineSessionKey: opts.contextEngineSessionKey });
  return { text: 'HEARTBEAT_OK' };
};
for (const nowMs of [1, 2]) {
  await runHeartbeatOnce({ cfg, sessionKey, deps: { nowMs: () => nowMs, getQueueSize: () => 0, getReplyFromConfig } });
}
console.log(JSON.stringify(observed, null, 2));
await fs.rm(tmpDir, { recursive: true, force: true });
EOF

Evidence after fix (console output):

[sessions/store] pruned stale session entries
[sessions/store] pruned stale session entries
[
  {
    "SessionKey": "agent:main:main:heartbeat",
    "contextEngineSessionKey": "agent:main:main:heartbeat-run:a5bcc111-0825-4313-bc6e-f2c0ab3809bd"
  },
  {
    "SessionKey": "agent:main:main:heartbeat",
    "contextEngineSessionKey": "agent:main:main:heartbeat-run:ad2bf62e-2a5e-4437-836b-19e54b72f069"
  }
]

Observed result after fix: Both isolated runs used the stable routing key agent:main:main:heartbeat, while the context-engine-only key changed between runs, proving the heartbeat tick kept stable delivery routing without reusing the prior tick's context-engine identity.
What was not tested: Live gateway/channel delivery, end-to-end overflow looping against a real provider, and non-PI harnesses.

Root Cause (if applicable)

  • Root cause: runHeartbeatOnce() created a fresh isolated session id, but still passed the stable <base>:heartbeat key into the auto-reply and embedded-runner context-engine lifecycle, so any context engine keyed by sessionKey could replay earlier heartbeat state.
  • Missing detection / guardrail: isolated-heartbeat tests only covered stable :heartbeat key convergence and did not assert a separate fresh identity for context-engine lifecycle/compaction paths.
  • Contributing context (if known): the same stable key was reused across bootstrap, assemble, afterTurn, ingest, and compaction maintenance paths.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/infra/heartbeat-runner.isolated-key-stability.test.ts
    • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts
    • src/agents/pi-embedded-runner/tool-result-context-guard.test.ts
    • src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
    • src/auto-reply/reply/agent-runner-execution.test.ts
  • Scenario the test should lock in: isolated heartbeat ticks keep the stable routing key for delivery, but all context-engine-owned lifecycle and compaction calls use a fresh per-run identity.
  • Why this is the smallest reliable guardrail: the bug is in the seam between heartbeat session setup and embedded context-engine lifecycle forwarding, so seam tests can prove the identity split without needing a full live heartbeat deployment.
  • Existing test that already covers this (if any): the pre-existing isolated heartbeat stability test covered stable :heartbeat routing convergence only.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • Heartbeat runs with isolatedSession=true now honor the documented fresh-session contract for context-engine state while keeping the existing stable heartbeat routing key.

Diagram (if applicable)

N/A.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local source checkout via node scripts/run-vitest.mjs
  • Model/provider: N/A
  • Integration/channel (if any): none
  • Relevant config (redacted): heartbeat isolated-session config covered by targeted tests

Steps

  1. Run isolated heartbeat regression coverage.
  2. Run embedded context-engine forwarding and overflow-compaction coverage.
  3. Confirm the new context-engine-only key stays fresh per isolated tick while the outward routing key remains stable.

Expected

  • Isolated heartbeat delivery still routes through <base>:heartbeat, but context-engine lifecycle/compaction paths do not reuse prior heartbeat state.

Actual

  • The targeted tests pass with distinct contextEngineSessionKey values across isolated ticks and stable outbound routing keys.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: inspected the heartbeat runner, embedded runner lifecycle forwarding, compaction paths, loop-hook forwarding, and targeted regression tests for the new session-key split.
  • Edge cases checked: isolated heartbeat wake re-entry still converges on the stable :heartbeat routing key; non-heartbeat paths still default to the existing sessionKey contract.
  • What you did not verify: a live gateway/channel heartbeat repro on current main.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
@NianJiuZst NianJiuZst marked this pull request as ready for review May 19, 2026 17:52
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds a heartbeat-only contextEngineSessionKey and threads it through auto-reply, PI embedded runner lifecycle/compaction, Codex app-server context-engine paths, and regression tests.

Reproducibility: yes. at source level: current main mints a fresh isolated heartbeat session id but passes the stable :heartbeat key into context-engine-facing lifecycle paths. I did not execute the full production overflow loop in this read-only review.

PR rating
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Summary: The patch direction looks sound and the prior typecheck issue is fixed, but real behavior proof is still too narrow for the changed lifecycle and compaction surface.

Rank-up moves:

  • Add redacted terminal or log proof from a real context-engine lifecycle/compaction run showing the per-tick key.
  • Wait for the remaining CI checks on the latest head to complete.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body includes terminal output for heartbeat option forwarding, but it does not show the newly changed PI/Codex context-engine lifecycle or compaction paths in a real run; add redacted terminal/log proof and then trigger a fresh review if needed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • The supplied terminal proof does not exercise a real context engine through PI/Codex bootstrap, assemble, afterTurn/ingest, maintenance, or compaction with the override.
  • The PR intentionally changes context-engine session identity for heartbeat runs; any missed path could keep replaying old heartbeat state or split lifecycle state inconsistently.
  • Relevant CI was still running during review, so merge should wait for the remaining checks to complete on 912768892e93325f4584207bfca262c24274df4f.

Maintainer options:

  1. Require lifecycle proof and green checks (recommended)
    Ask for redacted terminal or log proof that a real PI or Codex context engine sees the per-tick key through lifecycle and compaction, then wait for the still-running CI checks.
  2. Accept targeted proof as enough
    Maintainers can choose to accept the existing option-forwarding script plus focused tests, while explicitly owning the remaining real-runtime proof gap.
  3. Pause if proof cannot be produced
    If a real context-engine run cannot be demonstrated, pause the PR and keep the linked issue as the canonical tracking item for the heartbeat overflow bug.

Next step before merge
The remaining action is contributor/maintainer proof and CI judgment, not a narrow automated code repair.

Security
Cleared: The diff adds typed session-key plumbing and tests; I found no new dependency, workflow, permission, credential, network, or supply-chain surface.

Review details

Best possible solution:

Land the identity split after redacted lifecycle/compaction proof shows a real context engine receiving the fresh key and the remaining CI finishes cleanly.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main mints a fresh isolated heartbeat session id but passes the stable :heartbeat key into context-engine-facing lifecycle paths. I did not execute the full production overflow loop in this read-only review.

Is this the best way to solve the issue?

Yes for the code direction: a context-engine-only key is the narrow fix that preserves stable routing while isolating context state. It is not merge-ready until proof covers the changed lifecycle/compaction path and remaining CI completes.

Label changes:

  • add rating: 🦪 silver shellfish: Current PR rating is 🦪 silver shellfish because proof is 🦪 silver shellfish, patch quality is 🐚 platinum hermit, and The patch direction looks sound and the prior typecheck issue is fixed, but real behavior proof is still too narrow for the changed lifecycle and compaction surface.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Acceptance criteria:

  • node scripts/run-vitest.mjs src/infra/heartbeat-runner.isolated-key-stability.test.ts
  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/tool-result-context-guard.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
  • node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts extensions/codex/src/app-server/compact.test.ts
  • CI check-test-types and changed checks on the PR head

What I checked:

Likely related people:

  • sahilsatralkar: Recent GitHub path history shows fix(cli): finalize context engine turns as the most relevant merged work on src/agents/harness/context-engine-lifecycle.ts, the generic lifecycle helper changed by this PR. (role: context-engine lifecycle contributor; confidence: medium; commits: 6d25ae5e0c0a; files: src/agents/harness/context-engine-lifecycle.ts)
  • jalehman: Codex app-server context-engine lifecycle and projection history includes the original app-server lifecycle work and later review on plugin-owned compaction safety. (role: Codex context-engine feature contributor and reviewer; confidence: medium; commits: 51186d272543, a059309a9f9a; files: extensions/codex/src/app-server/run-attempt.ts, extensions/codex/src/app-server/compact.ts)
  • 100yenadmin: Recent history on the Codex compaction path includes plugin-owned context-engine compaction safety work touching the same app-server compaction surface. (role: recent context-engine compaction contributor; confidence: medium; commits: a059309a9f9a; files: extensions/codex/src/app-server/compact.ts)
  • dutifulbob: Recent history on src/agents/pi-embedded-runner/run/attempt.ts includes adjacent embedded runner model-policy work on the path this PR threads through. (role: recent embedded runner contributor; confidence: low; commits: 13c97c5a8d04; files: src/agents/pi-embedded-runner/run/attempt.ts)
  • Patrick-Erichsen: Recent GitHub path history for src/infra/heartbeat-runner.ts shows adjacent heartbeat work before this PR, making them a useful routing candidate for heartbeat behavior review. (role: recent heartbeat-area contributor; confidence: low; commits: d60ab485114a; files: src/infra/heartbeat-runner.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a002c416c7af.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 19, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot added extensions: codex and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 19, 2026
@clawsweeper clawsweeper Bot added status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 19, 2026
@NianJiuZst NianJiuZst force-pushed the codex/fix-84218-heartbeat-context branch from dfc8c99 to 882d88b Compare May 20, 2026 02:14
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: codex merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heartbeat isolatedSession=true replays prior heartbeat context, causing deterministic overflow/restart loop

1 participant