Skip to content

fix(agents): correlate compaction attribution events (#825)#485

Closed
karmafeast wants to merge 1 commit intocael/325-canonical2from
frond-scribe/swim39-825-compaction-event-joint-attribution
Closed

fix(agents): correlate compaction attribution events (#825)#485
karmafeast wants to merge 1 commit intocael/325-canonical2from
frond-scribe/swim39-825-compaction-event-joint-attribution

Conversation

@karmafeast
Copy link
Copy Markdown

Summary

  • Problem: compaction observability emitted independent request_compaction, [compaction-diag], and session-store counter signals without enough shared attribution to join trigger, outcome, and count deltas.
  • Why it matters: SWIM/cohort debugging could see compactionCount move without knowing which terminal compaction event caused it, or whether a volitional enqueue reached the matching runtime compaction attempt.
  • What changed: request_compaction now creates a shared diagId, logs runId/diagId/trigger=volitional at enqueue and terminal resolution, and threads that attribution into compactEmbeddedPiSession; embedded compaction events now expose/log normalized trigger plus count before/after/delta; session-store count reconcile logs the same run/trigger/outcome attribution.
  • What did NOT change (scope boundary): no compaction scheduling, threshold, provider selection, delegate dispatch, or counter increment semantics changed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: request_compaction logged only session/usage/reason, the runtime compaction diagnostic owned its own diagId, and the session-store compaction count reconcile updated state without logging a joinable terminal attribution record.
  • Missing detection / guardrail: the existing tests proved counter truthfulness and runtime request behavior independently, but did not assert that enqueue attribution is threaded into execution or that terminal compaction events include count deltas.
  • Contributing context (if known): upstream pi-coding-agent compaction events carry a reason but no OpenClaw correlation id, so OpenClaw must add run/trigger/count attribution at the subscribe boundary.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/tools/request-compaction-tool.volitional-threading.test.ts and src/agents/pi-embedded-subscribe.handlers.compaction.test.ts
  • Scenario the test should lock in: volitional compaction request attribution reaches triggerCompaction, result details, and enqueue/terminal logs; compaction-end handling emits trigger/count-before/count-after/count-delta attribution.
  • Why this is the smallest reliable guardrail: it covers the OpenClaw-owned boundaries without running a full embedded agent session.
  • Existing test that already covers this (if any): existing counter truthfulness and session-store reconcile tests were extended with attribution assertions.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

None for ordinary users. Diagnostic logs/events now include correlation fields (runId, diagId, trigger, and compaction count deltas) for compaction attribution.

Diagram (if applicable)

Before:
request_compaction enqueue -> runtime compaction diag -> sessions.json count
(no shared join key/count delta)

After:
request_compaction enqueue(runId, diagId, trigger)
  -> runtime compaction diag(runId, diagId, trigger, outcome)
  -> compaction terminal event/store reconcile(runId, trigger, outcome, count delta)

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node 22 / pnpm
  • Model/provider: N/A
  • Integration/channel (if any): embedded runner compaction diagnostics
  • Relevant config (redacted): N/A

Steps

  1. Trigger a volitional request_compaction with continuation enabled and a live run id.
  2. Observe enqueue and terminal request_compaction logs.
  3. Observe embedded compaction terminal event/session-store count reconcile when a compaction succeeds.

Expected

  • Volitional enqueue, runtime compaction, and terminal result share runId and diagId where applicable.
  • Terminal compaction events/reconcile logs include trigger, outcome, and compaction count before/after/delta.

Actual

  • Before this PR, enqueue had no trigger/run/diag id, runtime diag ids were not threaded from the enqueue, and session-store count updates had no joinable terminal attribution.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Verified after the fix:

  • pnpm test src/agents/tools/request-compaction-tool.volitional-threading.test.ts src/agents/tools/request-compaction-tool.test.ts src/agents/pi-embedded-subscribe.handlers.compaction.test.ts
  • pnpm test src/agents/openclaw-tools.continuation-registration.test.ts src/agents/tools-effective-inventory.test.ts src/auto-reply/reply/commands-system-prompt.test.ts
  • pnpm tsgo:core
  • pnpm tsgo:core:test
  • pnpm lint

Gate notes:

  • pnpm check / pnpm tsgo:extensions fail in untouched extension baselines: extensions/codex duplicate @mariozechner/pi-agent-core type identities and extensions/qqbot zod v3/v4 mismatch.
  • pnpm check:test-types passes core test types, then fails in the same untouched extension type baseline.
  • pnpm check:changed expands to all lanes from canonical-branch baseline surfaces and stops at the same extension type failures.
  • pnpm format:check reports broad pre-existing formatting drift outside this PR; scripts/committer formatted the staged PR files.

Human Verification (required)

  • Verified scenarios: request_compaction attribution payload/logs, compaction-end terminal event data, session-store count reconcile path, tool registration/inventory wiring.
  • Edge cases checked: threshold reason normalizes to budget; willRetry=true successful overflow compaction still records a count delta.
  • What you did not verify: a live SWIM channel run with real provider compaction.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: new diagnostic fields could be interpreted as product API by downstream consumers.
    • Mitigation: fields are additive and confined to diagnostic compaction stream/log payloads; no behavior changes depend on them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f25f9116f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

provider,
model,
authProfileId: compactionAuthProfileId,
trigger: request.trigger,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep volitional compaction on the manual execution path

Passing trigger: request.trigger changes runtime behavior for request_compaction: this call previously omitted trigger, so compactEmbeddedPiSession defaulted to manual semantics, but now trigger="volitional" bypasses all params.trigger === "manual" branches (notably forced compaction target/force selection in compact.queued.ts and manual-boundary hardening in compact.ts). In sessions where context pressure is high enough for the tool guard but below automatic compaction thresholds, volitional requests can now be skipped or produce different post-compaction boundaries, which is a functional regression (the same pattern is also present in followup-runner.ts).

Useful? React with 👍 / 👎.

@elliott-dandelion-cult
Copy link
Copy Markdown

Superseded by #502 — rebased onto canonical2 post-#500 with union resolution against #486 (cooldown-arm collision at request-compaction-tool.ts:223-225). Co-author attribution preserved. 25/25 tests green locally.

Cohort context: 🩸 took (1) "drive-it" call in #sprites-of-thornfield ~13:50Z; #485 not gating SUT cut.

Closing in favor of #502. 🌻🩸🌫🌊🍖🩲

silas-dandelion-cult added a commit that referenced this pull request May 1, 2026
…tion-registration to absorb cold module-load cost

The first test in src/agents/openclaw-tools.continuation-registration.test.ts
("registers no continuation tools when continuation.enabled is unset") pays
the cold module-load cost for createOpenClawTools and its transitive imports
(compaction-attribution, pi-embedded-*, plugins/tools, config/config) under
400+ concurrent test files in the agent project.

Quiet-box first-test duration: ~95s. CI noise pushes it past vitest's 120s
per-test default, producing a flaky timeout that has now been observed across
multiple unrelated PRs:

- #485 head (compaction-attribution scope) — first-test timeout
- #488 (downstream of #485 hypothesis) — first-test timeout
- #468 head (does NOT touch this file) — same first-test, same file, timeout

Test file content is byte-identical between base cael/325-canonical2 and #485
head; the timeout is not a regression introduced by any of those PRs. Tests
2-7 in this file reuse the warm cache (~360ms each) and are unaffected.

Cure: per-test timeout bump to 240s on the first test only, with a comment
documenting the cold-start mechanism so future readers know why this single
test has a non-default timeout.

Standalone fix, deliberately not folded into #485 to keep its compaction-
attribution scope clean. Unblocks #485, #488, #468, and any future PR that
randomly trips the same flake.

Verified by silas-pr485-fixup-v2 subagent (2026-05-01 07:29 UTC):
- Local on base a3dcc2a: first test 95023ms, passed (25s margin to 120s)
- CI on #485 head 9f25f91: first test >120000ms, timed out
- CI on #468 run 25169814732: first test >120000ms, timed out (same file)
ronan-dandelion-cult pushed a commit that referenced this pull request May 1, 2026
…tion-registration to absorb cold module-load cost (#498)

The first test in src/agents/openclaw-tools.continuation-registration.test.ts
("registers no continuation tools when continuation.enabled is unset") pays
the cold module-load cost for createOpenClawTools and its transitive imports
(compaction-attribution, pi-embedded-*, plugins/tools, config/config) under
400+ concurrent test files in the agent project.

Quiet-box first-test duration: ~95s. CI noise pushes it past vitest's 120s
per-test default, producing a flaky timeout that has now been observed across
multiple unrelated PRs:

- #485 head (compaction-attribution scope) — first-test timeout
- #488 (downstream of #485 hypothesis) — first-test timeout
- #468 head (does NOT touch this file) — same first-test, same file, timeout

Test file content is byte-identical between base cael/325-canonical2 and #485
head; the timeout is not a regression introduced by any of those PRs. Tests
2-7 in this file reuse the warm cache (~360ms each) and are unaffected.

Cure: per-test timeout bump to 240s on the first test only, with a comment
documenting the cold-start mechanism so future readers know why this single
test has a non-default timeout.

Standalone fix, deliberately not folded into #485 to keep its compaction-
attribution scope clean. Unblocks #485, #488, #468, and any future PR that
randomly trips the same flake.

Verified by silas-pr485-fixup-v2 subagent (2026-05-01 07:29 UTC):
- Local on base a3dcc2a: first test 95023ms, passed (25s margin to 120s)
- CI on #485 head 9f25f91: first test >120000ms, timed out
- CI on #468 run 25169814732: first test >120000ms, timed out (same file)
ronan-dandelion-cult pushed a commit that referenced this pull request May 1, 2026
…edes #485) (#502)

* fix(agents): correlate compaction attribution events (rebased; supersedes #485)

Adds [request_compaction:resolved-success] log line for volitional compaction-success path so attribution events can be correlated with dispatch trigger (sessionKey + runId + diagId + outcome).

Rebased onto canonical2 post-#500. Adjacent-but-orthogonal collision with #486 (cooldown-arm at request-compaction-tool.ts:223-225) resolved as union: cooldown-arm → log-emit → counter-increment.

Original work in #485 by @karmafeast; superseded by this rebased commit because canonical2 now contains #486 cooldown-arm that #485 did not anticipate.

Co-authored-by: karmafeast <karmafeast@gmail.com>

* test(plugins): use objectContaining for compaction emit assertions

compaction-attribution work in this PR added trigger/sessionKey/compactionCount{Before,After,Delta} fields to the compaction emitAgentEvent payloads in pi-embedded-subscribe.handlers.compaction.ts. The wired-hooks-compaction.test.ts assertions used exact-match shapes that did not include the new fields, causing 3 tests to fail in agentic-plugins shard.

Switching to expect.objectContaining preserves the original behavioral assertion (phase + willRetry + completed) while tolerating the additive attribution fields.

---------

Co-authored-by: karmafeast <karmafeast@gmail.com>
karmafeast pushed a commit that referenced this pull request May 1, 2026
…tion-registration to absorb cold module-load cost (#498)

The first test in src/agents/openclaw-tools.continuation-registration.test.ts
("registers no continuation tools when continuation.enabled is unset") pays
the cold module-load cost for createOpenClawTools and its transitive imports
(compaction-attribution, pi-embedded-*, plugins/tools, config/config) under
400+ concurrent test files in the agent project.

Quiet-box first-test duration: ~95s. CI noise pushes it past vitest's 120s
per-test default, producing a flaky timeout that has now been observed across
multiple unrelated PRs:

- #485 head (compaction-attribution scope) — first-test timeout
- #488 (downstream of #485 hypothesis) — first-test timeout
- #468 head (does NOT touch this file) — same first-test, same file, timeout

Test file content is byte-identical between base cael/325-canonical2 and #485
head; the timeout is not a regression introduced by any of those PRs. Tests
2-7 in this file reuse the warm cache (~360ms each) and are unaffected.

Cure: per-test timeout bump to 240s on the first test only, with a comment
documenting the cold-start mechanism so future readers know why this single
test has a non-default timeout.

Standalone fix, deliberately not folded into #485 to keep its compaction-
attribution scope clean. Unblocks #485, #488, #468, and any future PR that
randomly trips the same flake.

Verified by silas-pr485-fixup-v2 subagent (2026-05-01 07:29 UTC):
- Local on base a3dcc2a: first test 95023ms, passed (25s margin to 120s)
- CI on #485 head 9f25f91: first test >120000ms, timed out
- CI on #468 run 25169814732: first test >120000ms, timed out (same file)
karmafeast added a commit that referenced this pull request May 1, 2026
…edes #485) (#502)

* fix(agents): correlate compaction attribution events (rebased; supersedes #485)

Adds [request_compaction:resolved-success] log line for volitional compaction-success path so attribution events can be correlated with dispatch trigger (sessionKey + runId + diagId + outcome).

Rebased onto canonical2 post-#500. Adjacent-but-orthogonal collision with #486 (cooldown-arm at request-compaction-tool.ts:223-225) resolved as union: cooldown-arm → log-emit → counter-increment.

Original work in #485 by @karmafeast; superseded by this rebased commit because canonical2 now contains #486 cooldown-arm that #485 did not anticipate.

Co-authored-by: karmafeast <karmafeast@gmail.com>

* test(plugins): use objectContaining for compaction emit assertions

compaction-attribution work in this PR added trigger/sessionKey/compactionCount{Before,After,Delta} fields to the compaction emitAgentEvent payloads in pi-embedded-subscribe.handlers.compaction.ts. The wired-hooks-compaction.test.ts assertions used exact-match shapes that did not include the new fields, causing 3 tests to fail in agentic-plugins shard.

Switching to expect.objectContaining preserves the original behavioral assertion (phase + willRetry + completed) while tolerating the additive attribution fields.

---------

Co-authored-by: karmafeast <karmafeast@gmail.com>
@ronan-dandelion-cult ronan-dandelion-cult deleted the frond-scribe/swim39-825-compaction-event-joint-attribution branch May 2, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants