Skip to content

feat(events): add audience field for hidden runtime-context system events#72201

Closed
lukeboyett wants to merge 1 commit into
openclaw:mainfrom
lukeboyett:feat/system-event-audience-internal-routing
Closed

feat(events): add audience field for hidden runtime-context system events#72201
lukeboyett wants to merge 1 commit into
openclaw:mainfrom
lukeboyett:feat/system-event-audience-internal-routing

Conversation

@lukeboyett

@lukeboyett lukeboyett commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Addresses part of #69492 (system-event-shape consumer-path leakage; under umbrella #69208 Track B).

Adds an optional audience: 'internal' | 'user-facing' field to SystemEvent, defaulting to 'user-facing' (full backward-compat for every existing caller). Internal-audience events are wrapped at the consumer integration point in the existing INTERNAL_RUNTIME_CONTEXT_BEGIN/END delimiters with the canonical header lines from formatAgentInternalEventsForPrompt. The agent runtime sees the content as runtime context; user-facing surfaces strip it via the already-installed stripInternalRuntimeContext consumers (sanitize-user-facing-text.ts, memory-host-sdk/host/session-files.ts, agents/internal-events.ts, history readers, Control UI extract).

Mirrors the producer-side runtime-context delimiter pattern landed in:

No new persistence shape, no consumer-side filter (avoids the audit-bypass concern flagged on #69217), no display-layer per-surface patch (avoids the rejection pattern from #69366).

Why this scope is tight

This primitive is the hidden runtime-context lane only. It is not a delivery-routing primitive. Events with a positive user delivery contract (exec completion via notifyOnExit, cron payloads, heartbeat acks) are NOT migrated — those have heartbeat-driven explicit-relay paths (buildExecEventPrompt / buildCronEventPrompt) plus tactical producer-side skips like bd60df3e53 (fix: silence cron exec completion noise). Marking them internal would suppress delivery on regular reply turns where the model is told to keep wrapped content private.

The user-deliverable noise question stays open; the maintainer's recent pattern is case-by-case (bd60df3e53, 3f63ba8fd808), and that lane should remain.

Changes

  • src/infra/system-events.ts — add SystemEventAudience type and optional audience field on SystemEvent + SystemEventOptions. Default "user-facing". JSDoc explicitly documents the scope (hidden runtime-context, not delivery routing) and the operator-surface caveat (peekSystemEvents/peekSystemEventEntries keep showing internal events for debugging). areSystemEventsEqual honors the field so duplicate-suppression and consumeSystemEventEntries prefix-match keep working.
  • src/auto-reply/reply/session-system-events.ts — partition the drain output into user-facing lines and internal lines. User-facing lines emit as plain System: ... (unchanged). Internal lines are wrapped in INTERNAL_RUNTIME_CONTEXT_BEGIN/END with the canonical header (matching formatAgentInternalEventsForPrompt). escapeInternalRuntimeContextDelimiters handles adversarial inner tokens.
  • src/cron/isolated-agent/delivery-dispatch.ts — migrate queueCronAwarenessSystemEvent to audience: "internal". This event is queued only after the isolated agent run has already delivered to the user via its own channel (gated on delivered === true and shouldQueueCronAwareness); the main session needs hidden context, not a fake-user System (untrusted): ... bubble.

Explicitly NOT migrated

  • maybeNotifyOnExit (exec completion) — notifyOnExit delivery contract.
  • executeMainSessionCronJob (cron payload) — user-scheduled reminder content.
  • Cron auto-disable warning, cron failure alert — user-relevant.
  • Cron wake() API — exposes arbitrary user-supplied text via gateway cron.wake.
  • Gateway presence (server-methods/system.ts), node events (server-node-events.ts).
  • Task registry, session-maintenance-warning, plugin runtime — caller-dependent / operator-facing / third-party.

JSDoc on SystemEventAudience explicitly excludes these from the lane and points at the existing delivery paths.

Follow-up plan (out of scope for this PR)

This PR is intentionally one-step. It introduces the primitive and migrates exactly one producer (queueCronAwarenessSystemEvent) so the wrap-on-drain integration point can be reviewed against a real call site. No additional audience-PR is pre-committed; future migrations should be evaluated case-by-case as concrete leakage reports surface.

Candidates that may earn audience: "internal" later if a real need is demonstrated: debug- or awareness-style producers that are queued after the user has already received delivery on a different channel (the same shape as queueCronAwarenessSystemEvent here). Each would need its own justification — no batch migration.

Explicitly NOT in any audience-PR sequence (these stay in the maintainer's case-by-case tactical-patch lane, e.g. bd60df3e53, 3f63ba8fd808):

  • exec completion (maybeNotifyOnExit)
  • cron payload delivery (executeMainSessionCronJob)
  • heartbeat acks
  • any other producer with a positive user delivery contract

A larger abstraction was floated during peer review — a 3-state shape (prompt-visible | hidden-context | dedicated-delivery) for events that have both an internal-awareness and a user-delivery contract. Not on the roadmap. That redesign collides with the heartbeat-runner explicit-relay paths and the producer-side skips the maintainer has been adding directly; pursuing it would re-open the rejection patterns from #69366. Flagging here only so future readers know it was considered and deliberately not pursued.

If maintainer direction differs, happy to follow guidance on #69492 before landing further migrations.

Real behavior proof

Refreshed 2026-05-26 after the force-rebase onto current upstream/main (d0dd8b8a41) + decoupling from forceSenderIsOwnerFalse / trusted / QueuedSystemEvent / resolveEventOwnerDowngrade (all removed upstream in 02182d5a30, refactor: remove sender owner tool gating, 2026-05-21). The 9-commit codex-iteration ladder was squashed to one clean commit (d5fc9d9609) atop the post-refactor base. Lab evidence below is from a fresh run against that exact commit; prod-runtime evidence (Mercury's deployed gateway) is currently from the pre-rebase 9-commit build (same observable behavior, just additional sender-owner code Mercury was carrying that is now gone), and will be regenerated on the next mercury/local-patches deploy.

  • Behavior or issue addressed: cron-awareness system events (queueCronAwarenessSystemEvent, queued onto the main session after the isolated cron has already delivered the result on its own channel) were appearing in the main session as user-visible System: cron run … bubbles, even though their purpose is internal awareness for the agent rather than a user-deliverable message. The fix migrates that producer to audience: "internal" so the autoreply integration site wraps the event in <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>><<<END_OPENCLAW_INTERNAL_CONTEXT>>> and the existing stripInternalRuntimeContext consumers remove it from every user-facing surface.
  • Real environment tested: local sandbox on macOS Sonoma (Darwin 23.6.0), Node 25.6.0, pnpm v11.2.2, against branch feat/system-event-audience-internal-routing HEAD d5fc9d9609. Mercury's production OpenClaw gateway daemon (LaunchDaemon, 11-account Matrix deployment) has been running an earlier 2-commit local-patch variant of this PR continuously and shows the wrap-delimiter primitive working in real transcripts (excerpts retained below); the rebased squashed commit shares the same observable behavior and will be re-attested on the next prod reload.
  • Exact steps or command run after the patch: (1) git fetch upstream && git checkout d5fc9d9609; (2) CI=true pnpm install (no native rebuild required); (3) pnpm tsgo:core:test (type-check); (4) pnpm vitest run src/infra/system-events.test.ts src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/auto-reply/reply/queue.collect.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts (touched-test surfaces); (5) pnpm tsx scripts/proof-pr72201.ts (end-to-end production-code-path harness against the actual enqueueSystemEventdrainFormattedSystemEventsstripInternalRuntimeContext pipeline, no mocks).
  • Evidence after fix (fresh harness output against rebased HEAD):
    • pnpm tsgo:core:test — clean exit 0.
    • pnpm vitest run …Test Files: 4 passed (4); Tests: 171 passed (171); Duration ~38s (53 in system-events.test.ts covering default audience, round-trip, equality, wrap shape, ordering, user-facing-only, internal-only, escape, end-to-end strip, exec-shape filter bypass, heartbeat-noise filter bypass; 118 across delivery-dispatch.double-announce.test.ts + queue.collect.test.ts + heartbeat-runner.ghost-reminder.test.ts covering producer migration, drain semantics, heartbeat-path queueing).
    • scripts/proof-pr72201.ts against HEAD d5fc9d9609 — drives the actual production code paths on a throwaway sessionKey, no network / matrix / cron involvement. Live terminal output (run 2026-05-26T15:49:37Z):
      === (A) post-enqueue queue snapshot ===
      [
        {
          "audience": "internal",
          "contextKey": "cron:proof-pr72201",
          "text": "Cron landed: daily cost report delivered to user. Hidden runtime context only."
        }
      ]
      
      === (B) rendered prompt body (model-visible) ===
      <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>
      OpenClaw runtime context (internal):
      This context is runtime-generated, not user-authored. Keep internal details private.
      
      System: [2026-05-26T15:49:37Z] Cron landed: daily cost report delivered to user. Hidden runtime context only.
      <<<END_OPENCLAW_INTERNAL_CONTEXT>>>
      
      --- wrap checks ---
        BEGIN delimiter present: true
        END delimiter present  : true
        END after BEGIN        : true
        canonical header line  : true
        event body present     : true
      
      === (C) user-facing view (after stripInternalRuntimeContext) ===
      ""
      
      --- strip checks ---
        no BEGIN in user view  : true
        no event body leaked   : true
      
      === RESULT ===
      PASS
      
      All three claims verified end-to-end on the rebased HEAD: (A) the event is queued with audience: "internal" (operator-side peekSystemEventEntries exposes it for debugging); (B) the prompt body the model sees is wrapped in <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>><<<END_OPENCLAW_INTERNAL_CONTEXT>>> with the canonical header lines; (C) stripInternalRuntimeContext reduces the user-facing view to an empty string — nothing from the internal event leaks to the transcript / history reader / channel surface. The harness lives at scripts/proof-pr72201.ts (local artifact, not pushed in this PR — output above is the captured terminal log).
    • Pre-rebase prod evidence (retained — wrap primitive in the running gateway): the wrap delimiters are already a proven primitive in Mercury's prod daemon. Live transcript excerpt from ~/.openclaw/agents/devin/sessions/3998fa7e-…03d.jsonl showing the <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>> / OpenClaw runtime context (internal): block landing in the user-message context (auto-stripped from user-visible surfaces by stripInternalRuntimeContext):
      [Sun 2026-05-03 21:14 EDT] <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>
      OpenClaw runtime context (internal):
      This context is runtime-generated, not user-authored. Keep internal details private.
      
      [Internal task completion event]
      source: subagent
      session_key: agent:devin:subagent:9d4f2234-…
      
    • Pre-rebase prod evidence (retained — cron isolated runs on the patched daemon): redacted runtime log excerpt from ~/.openclaw/cron/runs/4309f6e7-f09d-4011-93ff-21726e3aa76d.jsonl:
      {"action":"finished","status":"ok","summary":"Top hit: openclaw-docs/automation/cron-jobs … [mercury | claude-sonnet-4-6]","sessionKey":"agent:mercury:cron:4309f6e7-…","durationMs":27268,"model":"gpt-5-mini","provider":"openai"}
      These are the exact runs whose post-delivery queueCronAwarenessSystemEvent calls produce the events this PR migrates to audience: "internal". Cron + main session have been continuously stable on the pre-rebase Mercury build; reload onto the squashed-rebase build is pending and will be re-attested.
    • Local codex peer review: codex review --base upstream/main against the rebased HEAD reports "no discrete, actionable regressions in the diff against the merge base. The new internal-audience routing is carried through enqueue, drain, heartbeat filtering, and collect-mode transcript handling with matching tests for the touched behavior."
  • Observed result after fix: cron-awareness events queued after isolated cron delivery no longer reach user-facing chat surfaces. The isolated cron continues to deliver its own result on its configured channel (unchanged), and the main session sees the awareness event only as wrapped runtime context for agent reasoning — not as a user-visible bubble. Local tsgo:core:test is clean; all 171 touched-surface vitest tests pass; harness PASS end-to-end against d5fc9d9609.
  • What was not tested:
    • No side-by-side pre-patch vs post-patch screenshot of a Matrix client showing the absence of the user-visible System: cron run … bubble — the negative-evidence form is hard to capture cleanly because the bubble simply stops appearing. The unit test in src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts asserts the audience: "internal" is set at the producer site as the structural regression.
    • No mantis: telegram-visible-proof capture — Mercury's prod channel is Matrix, not Telegram. Happy to capture Telegram visible proof if a maintainer can point at a usable test account, but cannot self-generate it without channel access.
    • The broader audience-routing primitive (additional producers beyond queueCronAwarenessSystemEvent) was intentionally not migrated in this PR; that scope is deferred to follow-up case-by-case work per system-event-shape consumer-path leakage; propose per-event audience classification #69492.
    • The rebased squashed commit has not yet been reloaded into Mercury's prod gateway daemon — pending the next mercury/local-patches deploy, after which the cron-awareness path will be re-exercised on the new code. The pre-rebase prod evidence above stands in for this gap; the new code path is byte-equivalent in observable behavior (only the internal type-level coupling to the removed sender-owner machinery was dropped).

Test plan

  • pnpm tsgo:core:test passes (clean exit; required for test-touching PRs per feedback_openclaw_check_gaps.md-style validation gap closure).
  • pnpm vitest run src/infra/system-events.test.ts passes (53/53).
  • pnpm vitest run src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/auto-reply/reply/queue.collect.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts passes (118/118).
  • pnpm tsx scripts/proof-pr72201.ts PASS end-to-end against the rebased HEAD d5fc9d9609 (see proof section).
  • codex review --base upstream/main against the rebased HEAD reports no actionable regressions.

Pre-PR peer review

This proposal was peer-reviewed by codex exec -m gpt-5.5 -c reasoning_effort=high before implementation. The peer review caught a contract-confusion issue between two earlier drafts (selective-drain vs wrap-on-drain) and pushed for the wrap-on-drain design landed here. Round-2 verdict: approve with PR-body wording tweak (avoid GitHub auto-close keywords for a partial fix; "Addresses part of #69492" is the framing applied here).

Degree of testing

Lightly tested. Unit tests cover the primitive default, round-trip, equality, the wrap shape, ordering, edge cases, end-to-end strip via stripInternalRuntimeContext, and adversarial delimiter-token escaping. The single migrated producer has a regression assertion in the existing test. No e2e or live-channel coverage added.

🤖 Generated with Claude Code

@greptile-apps

greptile-apps Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds an optional audience: 'internal' | 'user-facing' field to SystemEvent, defaulting to 'user-facing' for full backward compatibility. Internal-audience events are wrapped in INTERNAL_RUNTIME_CONTEXT_BEGIN/END delimiters at drain time so the model sees hidden runtime context while all existing stripInternalRuntimeContext consumers continue to scrub it from user-facing surfaces. The single migrated producer is queueCronAwarenessSystemEvent in delivery-dispatch.ts.

Confidence Score: 4/5

Safe to merge — no correctness issues found; the primitive is well-tested and fully backward-compatible.

The implementation is logically sound, the escape/strip round-trip is correct, and the nine new tests cover the default, round-trip, equality, wrap shape, ordering, edge cases, and adversarial delimiter injection. The only findings are P2 style/maintainability observations.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/session-system-events.ts
Line: 119-123

Comment:
**Header text duplicated from `LEGACY_INTERNAL_CONTEXT_HEADER`**

The three header lines pushed here are an exact copy of the private `LEGACY_INTERNAL_CONTEXT_HEADER` constant in `src/agents/internal-runtime-context.ts`. They aren't used by the strip logic (which relies on the BEGIN/END delimiters), so a mismatch doesn't cause a runtime bug today — but if the canonical wording ever changes in `internal-runtime-context.ts`, this copy will silently drift. Exporting a shared `INTERNAL_RUNTIME_CONTEXT_HEADER` constant (or a helper that pushes the standard framing) would keep the two sites in sync.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(events): key consecutive-duplicate s..." | Re-trigger Greptile

Comment thread src/auto-reply/reply/session-system-events.ts
@clawsweeper

clawsweeper Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 27, 2026, 7:02 PM ET / 23:02 UTC.

Summary
The PR adds an optional SystemEvent.audience field, wraps internal-audience system events in the existing internal runtime-context delimiters, preserves heartbeat/collect-mode behavior, and migrates cron awareness events to the internal lane.

PR surface: Source +207, Tests +477. Total +684 across 11 files.

Reproducibility: yes. at source/proof level: the PR body's terminal proof drives the production enqueue, drain, and strip path on the PR head, and the merge-result source shows the same path. I did not run the harness locally because this review was read-only.

Review metrics: 1 noteworthy metric.

  • Event audience surface: 1 optional field added; 1 producer migrated. The new field changes event routing semantics, while the branch intentionally limits behavior migration to cron awareness for reviewability.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Keep required checks green on the current merge ref before landing.
  • Have a maintainer explicitly accept the new audience primitive and the one-producer migration scope.

Mantis proof suggestion
A native Telegram proof would directly show that the cron delivery reaches chat while the follow-up awareness event does not appear as a visible system bubble. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram desktop proof: verify that an isolated cron delivery reaches Telegram while the queued cron-awareness system event stays hidden from the visible transcript.

Risk before merge

  • This introduces a new core event-routing bit; a future producer marked internal incorrectly could suppress a message that has a positive user-delivery contract.
  • The hidden lane is user-invisible but still model-visible, so delimiter escaping, system-tag sanitization, and every strip consumer remain security-boundary-sensitive.
  • The fix intentionally leaves broader system-event leakage work to system-event-shape consumer-path leakage; propose per-event audience classification #69492, so this should land only as the bounded primitive plus the one migrated cron-awareness producer.

Maintainer options:

  1. Gate on merge-ref checks and routing review (recommended)
    Before merge, confirm required checks are green on the current merge ref and that the one migrated cron-awareness producer is the only behavior migration in scope.
  2. Accept the bounded primitive
    Maintainers may accept the new audience field as the core seam while explicitly keeping future producer migrations under separate review.
  3. Pause for canonical event taxonomy
    If maintainers want a broader prompt-visible/hidden/dedicated-delivery taxonomy instead, pause this branch and resolve that product direction in the canonical issue first.

Next step before merge
This is not a narrow repair lane item; it needs maintainer judgment on the new core event-routing primitive and its message/session/security boundary risks.

Security
Cleared: No supply-chain change or concrete security defect was found; the merge result preserves system-tag sanitization and delimiter escaping, while the hidden-context boundary remains a maintainer merge-risk consideration.

Review details

Best possible solution:

Land after maintainer acceptance of the audience primitive and green required checks on the merge ref, then keep additional producer migrations case-by-case under #69492.

Do we have a high-confidence way to reproduce the issue?

Yes, at source/proof level: the PR body's terminal proof drives the production enqueue, drain, and strip path on the PR head, and the merge-result source shows the same path. I did not run the harness locally because this review was read-only.

Is this the best way to solve the issue?

Yes, with maintainer approval: the patch keeps existing events user-facing by default, reuses the existing internal runtime-context strip contract, and migrates only the post-delivery cron-awareness producer. The safer long-term path is to keep additional migrations case-by-case rather than batch-converting user-deliverable events.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 550a9b459a35.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal proof on the rebased head showing the internal event queued, wrapped for the model, and stripped from the user-facing view.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal proof on the rebased head showing the internal event queued, wrapped for the model, and stripped from the user-facing view.
  • add mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The concrete migrated cron-awareness behavior can be demonstrated as a user-visible Telegram transcript no-leak check even though the implementation is channel-agnostic.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

  • P2: The PR is a normal-priority core runtime improvement with meaningful but bounded session/message visibility impact.
  • merge-risk: 🚨 message-delivery: Misclassifying events as internal can hide messages that should be delivered to users, and this PR changes heartbeat/cron relay selection.
  • merge-risk: 🚨 session-state: The patch changes when queued system events are consumed, preserved, or batched into follow-up transcripts.
  • merge-risk: 🚨 security-boundary: The new lane relies on delimiters, escaping, sanitization, and strip consumers to keep model-visible runtime context out of user-facing surfaces.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal proof on the rebased head showing the internal event queued, wrapped for the model, and stripped from the user-facing view.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal proof on the rebased head showing the internal event queued, wrapped for the model, and stripped from the user-facing view.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The concrete migrated cron-awareness behavior can be demonstrated as a user-visible Telegram transcript no-leak check even though the implementation is channel-agnostic.
Evidence reviewed

PR surface:

Source +207, Tests +477. Total +684 across 11 files.

View PR surface stats
Area Files Added Removed Net
Source 7 236 29 +207
Tests 4 477 0 +477
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 11 713 29 +684

What I checked:

  • Repository policy read: Root AGENTS.md was fully read and applied; its guidance is relevant because this PR changes core runtime routing, session-state/message-delivery behavior, and hidden context security boundaries. (AGENTS.md:1)
  • Scoped policy read: src/agents/AGENTS.md was read because the PR uses internal runtime-context helpers; it added review context but no separate line-level finding. (src/agents/AGENTS.md:1)
  • Merge-ref provenance: The available GitHub merge ref combines the PR head with base 77fe36b and is the best local read-only merge-result evidence for the patch shape. (6350ffe38aaf)
  • System event contract: The merge result adds SystemEventAudience, stores defaulted audience on queued events, includes audience in dedupe/equality, and preserves current main's sanitizeInboundSystemTags queue-boundary sanitizer. (src/infra/system-events.ts:18, 6350ffe38aaf)
  • Wrapped drain behavior: The merge result leaves internal events queued during heartbeat drains, drains them on normal turns, wraps them in INTERNAL_RUNTIME_CONTEXT_BEGIN/END, and escapes inner delimiter tokens. (src/auto-reply/reply/session-system-events.ts:44, 6350ffe38aaf)
  • Heartbeat routing guard: Heartbeat preflight, cron/exec prompt selection, and heartbeat consumption all skip audience: "internal" events so hidden awareness is neither relayed visibly nor consumed before the wrapped drain path. (src/infra/heartbeat-runner.ts:979, 6350ffe38aaf)

Likely related people:

  • steipete: Current-main blame/log for the central system-event queue path points to recent session/runtime work in commit 15b1e99. (role: recent area contributor; confidence: medium; commits: 15b1e99df310; files: src/infra/system-events.ts, src/auto-reply/reply/session-system-events.ts, src/infra/heartbeat-runner.ts)
  • statxc: Merged the recent system-event dedupe identity work that this PR extends by adding audience to dedupe/equality semantics. (role: recent system-event contributor; confidence: medium; commits: 6943b5b4a6d3; files: src/infra/system-events.ts, src/infra/system-events.test.ts)
  • jalehman: Merged the runtime prompt context refactor that defines the hidden runtime-context boundary reused by this PR. (role: adjacent runtime-context owner; confidence: medium; commits: 6dae3c273de6; files: src/agents/internal-runtime-context.ts, src/agents/internal-events.ts)
  • vincentkoc: Recent allocation-trim work rewrote the session-system-events drain shape that this PR composes with. (role: adjacent session-event contributor; confidence: medium; commits: 468c6a0101e9; files: src/auto-reply/reply/session-system-events.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from c2ae546 to 1ddb5ec Compare April 27, 2026 15:16
@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from 1ddb5ec to ad6b10b Compare April 27, 2026 16:46
@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from ad6b10b to 6137343 Compare April 27, 2026 17:48
@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from 6137343 to 4a6f1dd Compare April 29, 2026 11:02
@openclaw-barnacle openclaw-barnacle Bot removed the agents Agent runtime and tooling label Apr 29, 2026
@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch 2 times, most recently from 6e7c5bd to 5aa64a9 Compare April 29, 2026 12:39
@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from 5aa64a9 to e68ce93 Compare May 6, 2026 00:28
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 6, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e68ce9353c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +116 to +117
if ((event.audience ?? "user-facing") === "internal") {
internalLines.push(...eventLines);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid downgrading owner auth for hidden events

When an internal event is also trusted: false (the new cron awareness producer does exactly this in src/cron/isolated-agent/delivery-dispatch.ts), these added lines place a System (untrusted): ... line inside the hidden runtime-context block. I checked the consumer path in src/auto-reply/reply/get-reply-run.ts: it tests the entire drained block with UNTRUSTED_SYSTEM_EVENT_LINE_RE before passing senderIsOwner, so even hidden internal context now forces senderIsOwner to false on the next owner reply. That disables owner-only directives/tools after a successful isolated cron delivery even though the untrusted text is not user-facing input; internal events should either not use the visible untrusted prefix for this auth scan or the scan should ignore the wrapped block.

Useful? React with 👍 / 👎.

@lukeboyett

Copy link
Copy Markdown
Contributor Author

Status update — rebased onto current main

Rebased onto upstream/main@7af1a87830 (post-v2026.5.4).

  • New HEAD: e68ce9353c
  • Branch: lukeboyett:feat/system-event-audience-internal-routing
  • 2 commits in the stack (one earlier export-refactor commit dropped — see below):
    1. 90e4572e77 feat(events): add audience field for hidden runtime-context system events
    2. e68ce9353c fix(events): key consecutive-duplicate suppression on (text, audience)
  • pnpm test -- src/infra/system-events.test.ts: 39 passed
  • pnpm test -- src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts: 50 passed
  • pnpm tsgo:core:test: clean

Forensic on adjacent upstream activity

Commit Effect on this PR
9180173f9a fix: preserve exec event routing and sanitize tool XML Introduced selectGenericSystemEvents + consumeSelectedSystemEventEntries to keep exec-completion events queued for the heartbeat path. Accommodated in the rebase: this PR's audience-routing now layers on top of consumeSelectedSystemEventEntries(...) instead of drainSystemEventEntries(...), so exec-completion events stay on the heartbeat lane while the audience split applies to the remaining generic events.
6dae3c273d fix: keep runtime prompt context out of system prompt (#77521) Refactored internal-runtime-context.ts and added OPENCLAW_RUNTIME_CONTEXT_NOTICE. Earlier prior version of this PR carried a refactor(events): export shared INTERNAL_RUNTIME_CONTEXT_HEADER_LINES commit to dedupe the canonical header text against LEGACY_INTERNAL_CONTEXT_HEADER. That commit was dropped during the rebase since OPENCLAW_RUNTIME_CONTEXT_NOTICE now covers the dedup. Functionally equivalent.
resetQueueState helper extraction Upstream consolidated the post-splice queue state reset into a resetQueueState(key, entry) helper. The audience-tracking commit (e68ce9353c) was rebased onto that helper, so lastAudience is reset alongside lastText/lastContextKey inside the helper rather than inline.

Upstream's INTERNAL_RUNTIME_CONTEXT_BEGIN/END delimiters and OPENCLAW_RUNTIME_CONTEXT_NOTICE are present, but no upstream commit currently provides a system-event audience-routing primitive — the internal vs user-facing lane split this PR adds remains an additive primitive over upstream. The wrapping site at drainFormattedSystemEvents is still the single integration point.

Re-review

@codex please re-review the rebased commits. Specific concerns to focus on:

  1. The merge with consumeSelectedSystemEventEntries — does the audience split correctly compose with the exec-completion exclusion (i.e., are exec-completion events with audience: 'internal' correctly left in the queue for the heartbeat path, not consumed here)?
  2. The lastAudience tracking inside resetQueueState — any edge cases where queue-state reset diverges from the (text, audience) suppression key invariant?
  3. Whether this PR's internal lane should adopt OPENCLAW_RUNTIME_CONTEXT_NOTICE as a single source of truth for the inserted header lines, replacing the inline strings in drainFormattedSystemEvents.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 9, 2026
Addresses the fourth codex P2 on openclaw#72201 (against PR head
d6f03b8403). Counterpart to the relay-selector filter in 761872d088.

selectSystemEventsConsumedByHeartbeat picked up audience: "internal"
events whenever isCronWake or contextKey: cron: matched, then
consumeSelectedSystemEventEntries removed them from the queue. With the
relay-selector skip already in place, the cron prompt template no longer
fires on the internal event — but the consume path still drained it
silently, so drainFormattedSystemEvents never saw the event and the
INTERNAL_RUNTIME_CONTEXT wrap never fired. Net effect: hidden
cron-awareness events were getting dropped on the floor on cron-wake
heartbeats — exactly the hidden agent-awareness this audience field is
meant to preserve.

Filter audience: "internal" out of all three branches of
selectSystemEventsConsumedByHeartbeat (exec selector, cron selector,
fallthrough "all pending events"). Internal events stay queued for the
next normal drain, which routes them through the wrap-on-drain
integration in session-system-events.ts.

Tests: 23 passed (was 22) — added a regression in
heartbeat-runner.ghost-reminder.test.ts asserting that an
audience: "internal" event with contextKey: "cron:..." queued before an
interval wake survives the heartbeat run (peekSystemEventEntries returns
it after runHeartbeatOnce). Existing drain-after-run behavior for
user-facing cron events is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 9, 2026
Addresses the fifth codex P2 on openclaw#72201 (against PR head
f39342979f).

resolveHeartbeatPreflight computes hasTaggedCronEvents from any pending
event whose contextKey starts with "cron:" — without filtering audience.
Combined with the queue-survival behavior in 1debc560ca (which
intentionally leaves audience: "internal" cron-tagged events queued for
the wrap-on-drain path), the preflight cron-tag signal would persist
across every subsequent interval heartbeat. That bypasses the normal
file gates and runs another generic heartbeat each interval until a
normal reply finally drains the queue — turning hidden cron-awareness
into a persistent heartbeat loop.

Apply the same audience-internal filter to the preflight cron-tag
detection. Internal events still stay queued; they just no longer
trigger the cron-tagged heartbeat preflight signal.

Tests: extended the existing "leaves audience: 'internal' events queued"
regression in heartbeat-runner.ghost-reminder.test.ts to do a second
runHeartbeatOnce on the same queue and assert:
- Second run still resolves as Provider: "heartbeat" (not cron-event).
- Second heartbeat ctx Body does NOT contain the awareness text.
- Internal event is still queued after the second run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 9, 2026
Addresses the sixth codex P2 on openclaw#72201 (against PR head
45e0157f30, still applicable on the rebased 5999ece).

selectGenericSystemEvents in session-system-events.ts excludes any event
whose text matches isExecCompletionEvent so the heartbeat exec relay
path can consume them. After the audience-aware fixes in 998dd9afd5
(prompt selectors) and 8d72e6e (consume selector) the heartbeat
path also skips audience: "internal" events, which combined creates a
no-consumer hole: an audience: "internal" event whose text happens to
match exec-completion syntax (e.g. cron output literally starting with
"Exec finished (...)" or "Exec failed (...)") gets stranded — the
generic drain filters it out as exec-shaped, the heartbeat path skips
it as internal, and it sits in peekSystemEventEntries indefinitely.

Make audience the source of truth: audience: "internal" events ALWAYS
go through the generic drain (which routes them to the
INTERNAL_RUNTIME_CONTEXT wrap) regardless of text shape. Text-shape
only matters for user-facing events, where it correctly keeps
exec-completion on the heartbeat lane.

Tests: 40 passed (was 39) — added a regression in
src/infra/system-events.test.ts asserting that an audience: "internal"
event with exec-completion-shaped text drains through the wrap (result
contains BEGIN/END delimiters and the event text, queue is empty after
drain). Existing "leaves exec completion events queued" test for
user-facing exec events unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 9, 2026
Addresses the seventh codex P2 on openclaw#72201 (against PR
head 08bbd85).

compactSystemEvent in session-system-events.ts strips heartbeat-noise
text (reason periodic, Read HEARTBEAT.md, heartbeat poll, heartbeat
wake) from user-facing relays so it does not appear in chat. After
consumeSelectedSystemEventEntries removes the event from the queue, a
null return from compactSystemEvent silently drops the event with
continue. For audience: "internal" events whose text happens to
include any of those phrases (e.g. cron output that mentions
"reason periodic" or "heartbeat poll" in its summary), this is the
same no-consumer hole class as the exec-shape filter from 08bbd85:
the queue entry is gone and the wrap-on-drain path never runs.

Pass the audience into compactSystemEvent and skip the noise filters
when audience === "internal". The Node: line sanitizer (which removes
the "last input <path>" suffix) still runs for both audiences — that
is a transformer, not a filter, and the safety value applies to both
relay surfaces. User-facing events keep being filtered exactly as
before.

Tests: 42 passed (was 40) — added two regressions in
src/infra/system-events.test.ts:
- audience: "internal" events containing each of the four heartbeat-
  noise phrases drain through the wrap (each phrase asserted
  individually so a regression in any single filter is caught).
- user-facing events containing the same phrases still get filtered
  (confirms the audience-bypass is precisely scoped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 9, 2026
Addresses two new P1 codex findings on openclaw#72201 (against
PR head 5277cf7).

Codex flipped on the security model. Earlier P2 reviews argued that
audience: "internal" events should be exempt from the owner-auth
downgrade scan because the wrap hides them from user-facing transcripts.
The new P1 reviews correctly observe that the wrapped block still flows
into the model prompt, so attacker-influenceable content (e.g. a cron
payload relayed via queueCronAwarenessSystemEvent with trusted: false)
must NOT have owner-only tool access just because the line is later
hidden from the user. The trust signal is the source of truth, not
visibility.

Revert the two owner-auth carve-outs:
- src/auto-reply/reply/get-reply-run.ts: drop the
  stripInternalRuntimeContext call before the UNTRUSTED_SYSTEM_EVENT_LINE_RE
  scan; scan the full drained block again.
- src/infra/heartbeat-runner.ts: drop the audience !== "internal" predicate
  from hasUntrustedInspectedEvents and hasUntrustedActiveSessionEvents;
  scan all trusted=false events again.

Routing carve-outs (relay selectors, consume selector, preflight cron-tag,
generic drain exec-shape exemption, noise-filter exemption) are unchanged
— those are about which surface drains/wraps the event, not security.

Tests: existing regression cases inverted to assert the new (correct)
behavior — audience: "internal" + trusted: false events DO trigger owner
downgrade in both the get-reply-run scan and the heartbeat scan. Total
test counts unchanged. pnpm tsgo:core:test clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukeboyett

Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main (post-a4b17d65a8).

  • New HEAD: c0010a7dcd
  • All 11 commits preserved.

Conflicts resolved (one beyond the usual CHANGELOG.md Unreleased churn):

src/auto-reply/reply/session-system-events.ts conflicted with @vincentkoc's 468c6a0101 "perf(core): trim reply and agent allocation churn" (merged 2026-05-03), which rewrote selectGenericSystemEvents from events.filter(...) to a for-loop and replaced the flatMap-based line formatter in drainFormattedSystemEvents with explicit for-loops, plus split out summaryLines from systemLines.

Resolution preserves @vincentkoc's allocation-trim intent while keeping this PR's audience routing:

  • selectGenericSystemEvents is the for-loop form, with the audience carve-out merged into the predicate (event.audience === "internal" || !isExecCompletionEvent(event.text)).
  • The drain loop now declares three arrays — summaryLines, userFacingLines, internalLines — and routes each event to the right bucket via a target ternary inside the for-loop, then assembles systemLines (with the INTERNAL_RUNTIME_CONTEXT_BEGIN/END wrap when internal events were present) before the summary block.
  • The summary handling and final summaryLines + systemLines join are unchanged from @vincentkoc's version.

src/auto-reply/reply/get-reply-run.ts conflicts were import-only (a transient stripInternalRuntimeContext import lived in commit 3 and was removed by the revert in commit 10; net: no stripInternalRuntimeContext import in final state).

Verification:

  • pnpm test -- src/infra/system-events.test.ts src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts — 163/163 passing.
  • pnpm tsgo:core:test — clean.

Branch is now MERGEABLE. Ready for further review.

@lukeboyett lukeboyett force-pushed the feat/system-event-audience-internal-routing branch from c0010a7 to 2f6518a Compare May 11, 2026 15:56
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 11, 2026
Addresses codex P2 on openclaw#72201 + the missing-changelog P3.

Internal-runtime-context blocks are runtime-generated and stripped from
every user-facing surface, so System (untrusted): lines inside the wrap
cannot have come from user input. Without this carve-out, an
audience: "internal" event whose payload happens to carry an untrusted
text fragment (e.g. cron output relayed for agent awareness via
queueCronAwarenessSystemEvent) would silently downgrade senderIsOwner on
the next reply turn — stripping owner-only tools from a turn the user
never saw the event for.

Strip the wrapped block via stripInternalRuntimeContext before applying
UNTRUSTED_SYSTEM_EVENT_LINE_RE so the scan only inspects the
user-facing portion of the drained system events. Owner-auth downgrade
still triggers on untrusted lines outside the wrap; that path is the
existing user-impersonation guard and is unchanged.

Tests: 48 passed (was 47) — added two regression cases in
get-reply-run.media-only.test.ts:
- Untrusted lines inside an INTERNAL_RUNTIME_CONTEXT block do NOT
  downgrade senderIsOwner.
- Untrusted lines outside the wrap still DO downgrade (carve-out
  scoped precisely to wrapped content).

CHANGELOG: Unreleased entry covers the audience primitive, the wrap-on-drain
integration, and the owner-auth scan fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 11, 2026
Addresses codex's 2nd P2 on openclaw#72201 (against PR head
2786604681).

Symmetric to the get-reply-run.ts carve-out from cd2719b9fd. The
heartbeat path also has its own owner-auth downgrade scan that fires
on any pending event with trusted=false (forces ForceSenderIsOwnerFalse
on the heartbeat ctx). Without this carve-out, an audience: "internal"
event waiting in the queue (e.g. queueCronAwarenessSystemEvent) drained
on the next heartbeat run would still disable owner-only tools and
directives for that turn — even though the awareness text was already
delivered on the cron's own channel and is now hidden runtime context.

audience: "internal" events are runtime-generated, sit behind the
INTERNAL_RUNTIME_CONTEXT wrap, and are stripped from every user-facing
surface. They cannot have come from user input by construction, so they
must be exempt from the user-impersonation guard. Untrusted events
without audience: "internal" still trigger the downgrade exactly as
before.

Tests: 21 passed (was 20) — added a regression case in
heartbeat-runner.ghost-reminder.test.ts asserting that an
audience: "internal" + trusted: false event in the queue does NOT
force ForceSenderIsOwnerFalse on the heartbeat ctx.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 11, 2026
…c selection

Addresses the third codex P2 on openclaw#72201 (against PR head 25a7a265e3).

resolveHeartbeatRunPrompt selects pending cron-shaped events for
buildCronEventPrompt and pending exec-completion events for the heartbeat
exec relay path. Both surfaces format their selected events for user
delivery — bypassing the wrap-on-drain integration in session-system-events.ts.
Without an audience filter on the selectors, an audience: "internal"
cron-awareness event with a cron: contextKey (or queued during a cron wake)
was getting pulled into cronEvents and relayed/double-announced to the user
as a CRON_EVENT_PROMPT instead of staying hidden runtime context.

Filter both selectors by event.audience !== "internal". The intent is that
audience: "internal" events route exclusively through the
INTERNAL_RUNTIME_CONTEXT_* wrap on the next drain — the cron and exec prompt
surfaces are user-facing relay paths and must not pick them up.

Tests: 22 passed (was 21) — added a regression case in
heartbeat-runner.ghost-reminder.test.ts asserting that an audience: "internal"
cron-shaped event with contextKey: "cron:..." queued before an interval wake
does NOT trigger CRON_EVENT_PROMPT and its text does NOT appear in the
heartbeat ctx Body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 11, 2026
Addresses the fourth codex P2 on openclaw#72201 (against PR head
d6f03b8403). Counterpart to the relay-selector filter in 761872d088.

selectSystemEventsConsumedByHeartbeat picked up audience: "internal"
events whenever isCronWake or contextKey: cron: matched, then
consumeSelectedSystemEventEntries removed them from the queue. With the
relay-selector skip already in place, the cron prompt template no longer
fires on the internal event — but the consume path still drained it
silently, so drainFormattedSystemEvents never saw the event and the
INTERNAL_RUNTIME_CONTEXT wrap never fired. Net effect: hidden
cron-awareness events were getting dropped on the floor on cron-wake
heartbeats — exactly the hidden agent-awareness this audience field is
meant to preserve.

Filter audience: "internal" out of all three branches of
selectSystemEventsConsumedByHeartbeat (exec selector, cron selector,
fallthrough "all pending events"). Internal events stay queued for the
next normal drain, which routes them through the wrap-on-drain
integration in session-system-events.ts.

Tests: 23 passed (was 22) — added a regression in
heartbeat-runner.ghost-reminder.test.ts asserting that an
audience: "internal" event with contextKey: "cron:..." queued before an
interval wake survives the heartbeat run (peekSystemEventEntries returns
it after runHeartbeatOnce). Existing drain-after-run behavior for
user-facing cron events is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lukeboyett added a commit to lukeboyett/openclaw that referenced this pull request May 11, 2026
Addresses the fifth codex P2 on openclaw#72201 (against PR head
f39342979f).

resolveHeartbeatPreflight computes hasTaggedCronEvents from any pending
event whose contextKey starts with "cron:" — without filtering audience.
Combined with the queue-survival behavior in 1debc560ca (which
intentionally leaves audience: "internal" cron-tagged events queued for
the wrap-on-drain path), the preflight cron-tag signal would persist
across every subsequent interval heartbeat. That bypasses the normal
file gates and runs another generic heartbeat each interval until a
normal reply finally drains the queue — turning hidden cron-awareness
into a persistent heartbeat loop.

Apply the same audience-internal filter to the preflight cron-tag
detection. Internal events still stay queued; they just no longer
trigger the cron-tagged heartbeat preflight signal.

Tests: extended the existing "leaves audience: 'internal' events queued"
regression in heartbeat-runner.ghost-reminder.test.ts to do a second
runHeartbeatOnce on the same queue and assert:
- Second run still resolves as Provider: "heartbeat" (not cron-event).
- Second heartbeat ctx Body does NOT contain the awareness text.
- Internal event is still queued after the second run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukeboyett

Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main (post-6943b5b4a6, @statxc's queue dedupe fix merged earlier today).

  • New HEAD: 2f6518a60f
  • All 11 commits preserved.

Conflict resolution. Two real conflicts beyond the usual CHANGELOG churn:

src/infra/system-events.ts — main now has the tightened findDuplicateInQueue / isDuplicateSystemEvent model from #73040 (text, contextKey, trusted, deliveryContext, with keyed = queue-wide and unkeyed = last-only). My branch's older lastText consecutive-only logic is gone in favor of the new shape.

While resolving, I extended both isDuplicateSystemEvent and areSystemEventsEqual to also include audience in the predicate. Rationale: a user-facing event and an internal-wrap event with identical (text, contextKey, trusted, deliveryContext) route to different surfaces (one as a plain System: line, the other inside INTERNAL_RUNTIME_CONTEXT_BEGIN/END), so they are not semantically duplicates — the wrap-on-drain contract requires both to reach the queue. This is a one-line extension per predicate. My PR introduces the audience concept, so it's the natural place to own this decision. Threaded audience through findDuplicateInQueue and the enqueueSystemEvent caller. Dropped the now-orphaned lastAudience / lastText SessionQueue fields that my older consecutive-only commits had been keying on.

src/infra/system-events.test.ts — main added 13 dedup-semantics tests (non-consecutive keyed dup, unkeyed allows non-consecutive dup, route disambiguation, trust disambiguation, post-eviction/consume reuse, etc.). My branch added 9 audience-routing tests. Both sets are preserved as siblings in the outer describe block.

Verification:

  • pnpm test -- src/infra/system-events.test.ts src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts — 175/175 passing.
  • pnpm tsgo:core:test — clean.

PR is MERGEABLE again. Thanks @statxc for the queue cleanup and @steipete for the tightening — the new (text, contextKey, trusted, deliveryContext) shape was a clean foundation to extend with audience. Open to changing my mind on including audience in the predicate if you'd rather keep the dedup identity narrow and let the two-lane contract rely on producer discipline instead.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f6518a60f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// so the heartbeat path can consume and deliver them.
const queued = consumeSelectedSystemEventEntries(
params.sessionKey,
selectGenericSystemEvents(peekSystemEventEntries(params.sessionKey)),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep heartbeat replies from draining internal events

When a non-isolated heartbeat runs before the next user reply, runHeartbeatOnce passes the main SessionKey into getReplyFromConfig, which still calls this generic drain before the heartbeat consume selector runs. Because internal events are selected here, the hidden cron-awareness event is consumed by the heartbeat prompt; prompt-prelude.ts keeps systemEventBlocks out of transcriptCommandBody, so an OK/silent/no-target heartbeat can drop the awareness before the next normal turn. Fresh evidence beyond the earlier heartbeat-consume finding is this live runHeartbeatOnce -> getReplyFromConfig -> drainFormattedSystemEvents path, which the new heartbeat tests stub out.

Useful? React with 👍 / 👎.

@lukeboyett

lukeboyett commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Fixed in 25e8f2a5d3.

Confirmed the bug exactly as described: drainFormattedSystemEvents is reached from runHeartbeatOnce → getReplyFromConfig → runPreparedReply, and heartbeat replies use buildReplyPromptEnvelopeBase's fixed transcript prompt which doesn't carry systemEventBlocks. So an internal event queued before a heartbeat got consumed from the queue, formatted into a wrap, and then dropped on the floor before any non-heartbeat turn could deliver it. queueCronAwarenessSystemEvent is the concrete victim — its audience: "internal" + trusted: false event silently disappeared whenever a heartbeat fired first.

Fix: threaded isHeartbeat?: boolean into drainFormattedSystemEvents. When true, selectGenericSystemEvents skips audience: "internal" events so they remain queued. The next non-heartbeat reply (which DOES carry systemEventBlocks) drains them through the wrap-on-drain path and delivers the runtime-context to the model. The user-facing exec-completion exclusion is unchanged — heartbeat relay still owns those.

The caller in get-reply-run.ts passes isHeartbeat (already in scope at that function level).

Test: new "leaves audience: internal events queued during a heartbeat drain" case in system-events.test.ts verifies all three properties:

  1. The heartbeat drain emits user-facing events but no INTERNAL_RUNTIME_CONTEXT_BEGIN/END markers and does not include the internal body.
  2. The internal event stays on the queue after the heartbeat drain (verified via peekSystemEvents).
  3. The next non-heartbeat drain consumes it and emits the wrap.

Verification:

  • pnpm test -- src/infra/system-events.test.ts src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts — 176/176 passing.
  • pnpm tsgo:core:test — clean.

On the real-behavior-proof staleness flag: noted, will refresh the PR body proof section against 25e8f2a5d3 once I can capture a live cron-awareness pass through the gateway. The mechanism it'll prove is the same as the earlier capture (cron delivery → main-session enqueue → next non-heartbeat reply wraps the internal event → user-facing surface strips it via stripInternalRuntimeContext), with the bug above now also covered (heartbeat between delivery and reply no longer eats the awareness).

@clawsweeper please re-review.

Re-review progress:

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25e8f2a5d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +178 to +185
systemLines.push(INTERNAL_RUNTIME_CONTEXT_BEGIN);
systemLines.push("OpenClaw runtime context (internal):");
systemLines.push(
"This context is runtime-generated, not user-authored. Keep internal details private.",
);
systemLines.push("");
systemLines.push(escapeInternalRuntimeContextDelimiters(internalLines.join("\n")));
systemLines.push(INTERNAL_RUNTIME_CONTEXT_END);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve hidden events through collect-mode queuing

When an audience: "internal" event is drained while another run is active and the session uses collect-mode queuing, this block is added only to queuedBody (the transcript body intentionally omits systemEventBlocks in src/auto-reply/reply/prompt-prelude.ts). I checked the collect drain path: renderCollectItem uses item.prompt, and the combined follow-up is submitted without a transcriptPrompt (src/auto-reply/reply/queue/drain.ts:133-138,252-258), so the <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>> block becomes ordinary queued prompt text and can be persisted/displayed instead of staying hidden. The wrapped context needs a transcript-safe counterpart when queued items are collected, or collect rendering should use each item's transcriptPrompt for visible transcript text.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1972afc0e6.

Confirmed the bug exactly as described. renderCollectItem in src/auto-reply/reply/queue/drain.ts used item.prompt verbatim, and the combined follow-up was submitted with { prompt, ... } only — no transcriptPrompt. So when an audience: "internal" event had been drained into queuedBody (e.g. via queueCronAwarenessSystemEvent) and a second message arrived during the active run triggering collect-mode batching, the <<<BEGIN_OPENCLAW_INTERNAL_CONTEXT>>>...<<<END_OPENCLAW_INTERNAL_CONTEXT>>> block became plain queued prompt text and would surface in the user-visible transcript.

Fix: scheduleFollowupDrain now builds both bodies:

  • prompt via renderCollectItem (existing; model-facing, wrap preserved so runtime context still reaches the model).
  • transcriptPrompt via new renderCollectItemTranscript (prefers each item's own transcriptPrompt set at enqueue time in get-reply-run.ts:975, falls back to stripInternalRuntimeContext(item.prompt)).

The transcript variant is only attached when the combined prompt carries a wrap or any grouped item already split prompt vs transcriptPrompt at enqueue — plain collects pay no extra cost.

Test: two new cases in queue.collect.test.ts:

  1. strips internal runtime-context wrap from the transcriptPrompt of a collected batch — proves wrap-bearing item.prompt yields a combined prompt that keeps the wrap, plus a transcriptPrompt that drops the wrap markers AND the wrapped body, and still merges the second queued item's plain text.
  2. omits transcriptPrompt when no items carry a wrap or transcript split — proves plain collects stay cost-free.

Note on parallel paths: the announce-queue collect path in src/agents/subagent-announce-queue.ts:213-235 uses a different AnnounceQueueItem schema (no transcriptPrompt field; internal events flow through a dedicated internalEvents channel), so it isn't reachable by this same shape and was left untouched.

Verification:

  • pnpm test -- src/auto-reply/reply/queue.collect.test.ts src/auto-reply/reply/queue.dedupe.test.ts src/infra/system-events.test.ts src/utils/queue-helpers.test.ts — all passing.
  • pnpm tsgo:core:test — clean.

@clawsweeper please re-review.

@lukeboyett

lukeboyett commented May 16, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main in c76b3c19ec.

Migrated to the structured forceSenderIsOwnerFalse contract per the P2 at get-reply-run.ts:715:

  • drainFormattedSystemEventBlock now returns { text, forceSenderIsOwnerFalse } end-to-end. My branch dropped the regex UNTRUSTED_SYSTEM_EVENT_LINE_RE.test(eventsBlock) scan entirely.
  • get-reply-run.ts adopts upstream's pattern: forceSenderIsOwnerFalseFromSystemEvents = eventsBlock.forceSenderIsOwnerFalse. No more text-based owner-auth derivation.
  • SystemEvent.audience composes orthogonally with forceSenderIsOwnerFalse. queueCronAwarenessSystemEvent sets BOTH audience: "internal" (for visibility hiding via INTERNAL_RUNTIME_CONTEXT wrap) AND forceSenderIsOwnerFalse: true (for owner-tool downgrade). The audience lane never exempts events from owner-downgrade — that's the producer's structured signal, not derived from text shape.
  • Per-event dedup keys include audience alongside forceSenderIsOwnerFalse.

Three commits dropped during rebase because they're obsolete under the structured contract:

  • 934ff65fb1 (strip-wrap-before-regex-scan in get-reply-run.ts) — regex scan no longer exists.
  • 370618b43e (heartbeat audience exemption from trusted: false scan) — heartbeat now reads structured forceSenderIsOwnerFalse directly (lines 1551/1554).
  • 5cd6ea26f6 (revert of the two above) — nothing to revert.

Net: 9 commits on top of upstream/main, no carve-outs that bypass owner-downgrade for hidden-context events. Untrusted runtime context still triggers owner-tool downgrade exactly as c0fe7ab34a intended.

Verification on c76b3c19ec:

  • pnpm tsgo:core:test — clean.
  • pnpm test -- src/infra/system-events.test.ts src/auto-reply/reply/queue.collect.test.ts src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/session.test.ts — 233/233 passing.
  • pnpm check — exit 0 (preflight + typecheck + lint + policy guards all green).

Fresh real-behavior proof against c76b3c19ec still pending — will refresh PR body once I can capture a live cron-awareness pass through the gateway under the new structured contract.

@clawsweeper please re-review.

Re-review progress:

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Clockwork Merge Sprite

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: sleeps inside passing CI.
Image traits: location proof lagoon; accessory green check lantern; palette violet, aqua, and starlight; mood watchful; pose stepping out of a freshly hatched shell; shell translucent glimmer shell; lighting calm overcast light; background small review tokens.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Clockwork Merge Sprite in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

…ents

Adds an optional `audience: 'internal' | 'user-facing'` field to
`SystemEvent`, defaulting to `'user-facing'` (full backward-compat for
every existing caller). Internal-audience events are wrapped at the
consumer integration point (`drainFormattedSystemEvents` in
session-system-events.ts) in the existing
`INTERNAL_RUNTIME_CONTEXT_BEGIN/END` delimiters with the canonical
header lines from `formatAgentInternalEventsForPrompt`. The agent
runtime sees the content as runtime context; user-facing surfaces
strip it via the already-installed `stripInternalRuntimeContext`
consumers (`sanitize-user-facing-text.ts`,
`memory-host-sdk/host/session-files.ts`, `agents/internal-events.ts`,
history readers, Control UI extract).

Mirrors the producer-side runtime-context delimiter pattern landed in
`e918e5f75c` (openclaw#71761) and `6e985a421d` (webchat runtime context). No
new persistence shape, no consumer-side filter (avoids the
audit-bypass concern flagged on openclaw#69217), no display-layer per-surface
patch (avoids the rejection pattern from openclaw#69366).

Heartbeat-path additions: `selectGenericSystemEvents` keeps
audience: "internal" events queued when the drain is invoked from a
heartbeat reply (whose prompt envelope discards
`systemEventBlocks`), so the wrapped runtime-context block is not
silently consumed before the next regular reply turn that actually
carries it through to the model. The heartbeat exec/cron-prompt
selectors also skip internal events so they don't get classified as
user-facing relay payloads.

`compactSystemEvent` bypasses heartbeat-noise filters
("reason periodic", "Read HEARTBEAT.md", "heartbeat poll/wake") for
audience: "internal" events — those go through the wrap-on-drain path
and never reach a user-facing surface, so filtering them after
consumption would silently drop the event (same no-consumer hole
class as the exec-shape filter).

Migrated one producer (the canonical justification for the lane):
- `queueCronAwarenessSystemEvent` in
  `src/cron/isolated-agent/delivery-dispatch.ts`. This event is
  queued onto the main session only after the isolated agent run has
  already delivered to the user via its own channel (gated on
  `delivered === true` and `shouldQueueCronAwareness`). The main
  session needs hidden runtime context so the next turn knows the
  cron landed, not a user-visible `System: ...` bubble.

Scope is intentionally tight. This primitive is the hidden
runtime-context lane only; it is NOT a delivery-routing primitive.
Events with a positive user delivery contract (exec completion via
`notifyOnExit`, cron payloads, heartbeat acks) are not migrated —
those have heartbeat-driven explicit-relay paths
(`buildExecEventPrompt` / `buildCronEventPrompt`) plus tactical
producer-side skips like `bd60df3e53`. Marking them internal would
suppress delivery on regular reply turns where the model is
instructed to keep wrapped content private.

Tests cover: default audience, round-trip preservation, audience
equality (consumeSystemEventEntries prefix-match respects audience),
wrap-on-drain shape with canonical header, mixed
user-facing+internal ordering, the user-facing-only and
internal-only edge cases, end-to-end strip via
stripInternalRuntimeContext, adversarial delimiter-token escape,
heartbeat-path audience: "internal" queueing semantics, and
exec-shaped internal events bypassing the text-shape filter.

Addresses part of openclaw#69492 (system-event-shape
consumer leakage; umbrella openclaw#69208 Track B). Leaves the
user-deliverable noise question (exec completion, cron payloads,
heartbeat acks) open for case-by-case follow-ups matching the
maintainer's revealed pattern (`bd60df3e53`, `3f63ba8fd808`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lukeboyett

Copy link
Copy Markdown
Contributor Author

Rebase note — 2026-05-26

Force-pushed onto current openclaw/openclaw main (d0dd8b8a41). The branch was ~1,400 commits behind and mergeable: CONFLICTING; the prior merge-base (a13468320c, 2026-05-20) is now stale.

Decoupled from forceSenderIsOwnerFalse / trusted / QueuedSystemEvent / resolveEventOwnerDowngrade. Those symbols were removed upstream in 02182d5a30 (steipete, refactor: remove sender owner tool gating, 2026-05-21). The previous PR ladder layered audience on top of the removed sender-owner machinery; that coupling is gone — audience now lands cleanly on the post-refactor SystemEvent shape.

Squashed the 9-commit codex-iteration ladder into 1 commit. The original chain's incremental design was intra-PR refinement against the pre-refactor base — after the upstream pivot, those commits' individual diffs were a mix of conflict resolution and forceSender decoupling, not standalone units worth preserving. The single commit d5fc9d9609 carries the same intent (audience field + drain wrap + heartbeat path semantics + cron-awareness migration) with the same test coverage.

Validation:

  • pnpm tsgo:core:test — clean
  • pnpm vitest run src/infra/system-events.test.ts — 53/53 pass
  • pnpm vitest run src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts src/auto-reply/reply/queue.collect.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts — 118/118 pass
  • codex review --base upstream/main — no actionable regressions

Diff stat is essentially unchanged (713/29 vs prior 718/27; the 5-line net delta is the dropped forceSenderIsOwnerFalse coupling in system-events.ts).

The "Real behavior proof" section of the PR description remains accurate; the underlying behavior (wrap delimiters + isolated-cron awareness migration) is unchanged.

@lukeboyett

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

No contributor-facing work is open: force-rebased onto current openclaw/openclaw main (squashed to a single clean commit, decoupled from the removed sender-owner machinery), proof: sufficient is restored, and the PR is MERGEABLE. The status: ⏳ waiting on author was auto-applied after the 2026-05-26 rebase re-triage — there is nothing left on the author side, so this should return to the maintainer-look lane.

@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@lukeboyett

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

The prior re-review run failed mid-pass (clawsweeper/actions/runs/26507462206) and left labels half-stripped (rating flipped to off-meta tidepool, proof: sufficient dropped to supplied, status/mantis removed). Requesting a clean re-review of the current head — no contributor-facing work is open; the PR is MERGEABLE with the audience-field change + real-behavior proof intact.

@clawsweeper

clawsweeper Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@lukeboyett

Copy link
Copy Markdown
Contributor Author

Closing this PR. After thorough investigation we can't reproduce the underlying leak it addresses, and main's shape has moved on:

  • The system-event-shape internal-context handling has converged on the structured internal-events model (internal-events.ts / internal-event-contract.tstask_completion events with a cron source) introduced since this PR opened. An audience field on SystemEvent is now a less natural fit than emitting structured internal events through that pipeline (formatAgentInternalEventsForPrompt, already stripped at user-facing surfaces).
  • We couldn't reproduce the cron-awareness double-presentation in practice: the awareness path (queueCronAwarenessSystemEvent) only fires under sessionTarget:isolated + an explicit-resolved delivery, and real isolated cron runs fall back from native framework delivery, so the awareness branch isn't reached.

A refactor to fit the structured internal-events shape may still be warranted, but we'd want to see the leak actually present on current main in the wild before chasing it further. Happy to revisit/reopen if it's observed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant