Skip to content

fix: Add runner label to /status#70595

Merged
Takhoffman merged 4 commits intomainfrom
codex/status-runner-label
Apr 23, 2026
Merged

fix: Add runner label to /status#70595
Takhoffman merged 4 commits intomainfrom
codex/status-runner-label

Conversation

@Takhoffman
Copy link
Copy Markdown
Contributor

Summary

This change adds an explicit Runner: field to /status so the status output tells you which execution path currently owns the session, not just which model is selected.

Examples of what /status can now show:

⚙️ Runtime: direct · Runner: pi (embedded) · Think: medium · elevated
⚙️ Runtime: direct · Runner: claude-cli (cli) · Think: off · elevated
⚙️ Runtime: direct · Runner: codex (acp/acpx) · Think: off · elevated
⚙️ Runtime: direct · Runner: gemini (acp/acpx) · Think: off · elevated

How it works

The status formatter now resolves the runner label from the session in this order:

  1. If the session has ACP metadata (sessionEntry.acp), /status shows the ACP harness agent and backend as <agent> (acp/<backend>).
    This is the exact case that distinguishes codex from gemini, claude, or any other ACP agent.
  2. Otherwise, if the active provider is a CLI-backed provider, /status shows <provider> (cli).
  3. Otherwise, /status falls back to pi (embedded).

That keeps the runner identity separate from the selected model line. The model still tells you what model/provider is being used, while Runner: tells you which runtime family is actually executing the session.

Verification

Focused verification passed with:

pnpm vitest --config test/vitest/vitest.auto-reply.config.ts run src/auto-reply/status.test.ts
pnpm vitest --config test/vitest/vitest.auto-reply.config.ts run src/auto-reply/reply/commands-status.test.ts

Note: the repo's default staged vitest hook currently routes this changed test file into a config that excludes src/auto-reply/**, which causes an unrelated No test files found failure. The targeted /status suites above are green.

@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels Apr 23, 2026
@Takhoffman Takhoffman changed the title Add runner label to /status fix: Add runner label to /status Apr 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

This PR adds a Runner: field to the /status output, resolved from session metadata in priority order: ACP agent/backend → CLI-backed provider → fallback pi (embedded). The implementation in resolveRunnerLabel is clean, null-safe, and correctly isolated from the existing Runtime: label.

Confidence Score: 5/5

Safe to merge — only one minor test-clarity P2 concern with no production impact.

All findings are P2 (test config readability). The production logic is correct and well-structured; no runtime bugs identified.

No files require special attention beyond the minor test config clarity note in src/auto-reply/status.test.ts.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/status.test.ts
Line: 120-132

Comment:
**Test config doesn't drive CLI detection**

The `config` passed here has `models.providers["claude-cli"]` set, but `isCliProvider` never looks at `config.models.providers`. It checks `cfg?.agents?.defaults?.cliBackends`, then `resolveRuntimeCliBackends()`, then `resolvePluginSetupCliBackendRuntime()`. If the test is green, CLI detection is happening via one of those runtime paths — not via the config object above. That makes the `config` property misleading: a reader might reasonably believe it's what makes `"claude-cli"` a CLI provider in this test, but it isn't. Consider either removing the spurious `models.providers` key or adding a `agents.defaults.cliBackends: { "claude-cli": {} }` entry that actually exercises the config-driven branch.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Add runner label to status output" | Re-trigger Greptile

Comment thread src/auto-reply/status.test.ts
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e6ffbabaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/status/status-message.ts Outdated
@Takhoffman Takhoffman merged commit 03477cc into main Apr 23, 2026
61 checks passed
@Takhoffman Takhoffman deleted the codex/status-runner-label branch April 23, 2026 12:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52905d76cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +201 to +204
const acpAgentRaw = normalizeOptionalString(args.sessionEntry?.acp?.agent);
const acpAgent = acpAgentRaw ? sanitizeTerminalText(acpAgentRaw) : undefined;
if (acpAgent) {
const backendRaw = normalizeOptionalString(args.sessionEntry?.acp?.backend);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve ACP runner label when agent is absent

resolveRunnerLabel only treats a session as ACP-backed when sessionEntry.acp.agent sanitizes to a non-empty value, so ACP entries with missing/blank agent metadata fall through to provider detection and can be mislabeled as pi (embedded) or CLI. That case is realistic because ACP runtime code already guards for missing meta.agent by deriving it from the session key (src/acp/control-plane/manager.core.ts:1377). The runner label should stay in the ACP path whenever ACP metadata exists, even if the stored agent string is empty.

Useful? React with 👍 / 👎.

steipete pushed a commit that referenced this pull request Apr 23, 2026
* Add runner label to status output

* Add changelog entry for status runner label

* Fix status runner detection and sanitization

(cherry picked from commit 03477cc)
medikoo pushed a commit to medikoo/openclaw that referenced this pull request Apr 24, 2026
* Add runner label to status output

* Add changelog entry for status runner label

* Fix status runner detection and sanitization

(cherry picked from commit 03477cc)
cael-dandelion-cult pushed a commit to karmaterminal/openclaw that referenced this pull request Apr 25, 2026
…e trap-class

Three confirmed instances from #325 phase-2 rebase:
- e515ea1 (gateway live test hardening; base has richer probes via f07b00d+a53fea3905+5f702b464b)
- aa1908b (docker live backend; same parallel-evolution pattern)
- 7ee46a3 (PR openclaw#70595 already in base CHANGELOG; base evolved naming + lookup table)

Discovery channels: git cherry (insufficient), CHANGELOG-byte-grep (high precision), conflict-content rubric.

Tracking: #331
Eats dogfood for: openclaw-bootstrap#702
ronan-dandelion-cult pushed a commit to karmaterminal/openclaw that referenced this pull request Apr 28, 2026
…y primitives)

Wires the §1 entry-point it.todo placeholder
  swim-runner.test.ts L80: "rebase bot classifies synthetic squash-rebased
  commit as DROP (not PICK)"
onto runnable ground by composing the two §1 discovery primitives
(changelog-grep + cherry-pick-provenance) into a single DROP/PICK/REVIEW
verdict with channel attribution and structured evidence for downstream
span emission.

API:
- classifyRebasePick({subject, commitBody, baseChangelog, isAncestorOf})
  returns {verdict: DROP|PICK|REVIEW, channel, evidence,
           needsConflictContentInspection?}

Channel ordering (memo-aligned, deterministic):
  1. cherry-pick-provenance (highest precision: exact source SHA,
     ancestor-of-base check unambiguous)
  2. changelog-grep:pr (high precision: PR tokens rarely collide)
  3. changelog-grep:subject (medium precision: substring, can collide)

Load-bearing invariant: when both positive-signal channels miss, the
verdict is REVIEW with needsConflictContentInspection=true \u2014 NOT a
silent default to PICK. The memo's third channel (conflict-content
classification rubric) needs file-path + diff inspection, not pure
string\u2192struct, so it's NOT covered here; classifier surfaces the gap
explicitly so the caller can hand off rather than misclassify the
test-harness divergence cases.

Evidence completeness: even when one channel is load-bearing for the
verdict, ALL channels' evidence is recorded in the result so the
downstream span emission can carry full audit data. Non-ancestor
cherry-pick footers go to a separate
slot for cross-branch provenance audit.

Boundary: pure function. Caller drives:
  - subject from `git log -1 --format='%s' <pick>`
  - body from `git show -s --format=%B <pick>`
  - base CHANGELOG from `git show <base>:CHANGELOG.md`
  - isAncestorOf wrapping `git merge-base --is-ancestor`

Coverage (10 tests):
- Instance 1 (openclaw#70595): DROP via changelog-grep:pr; no
  needsConflictContentInspection on DROP
- Instance 3 (cherry-pick): DROP via cherry-pick-provenance;
  precedence over CHANGELOG when both fire; non-ancestor footer
  doesn't trigger DROP via that channel
- Instance 2 (test-harness): REVIEW with
  needsConflictContentInspection=true; classifier never defaults to
  PICK on absence-of-signal
- subject-channel: DROP via changelog-grep:subject
- evidence completeness: both channels' evidence recorded when both
  fire; non-ancestor footers separated from load-bearing slot

Result: 10/10 passing this file. swim-37 vitest project: 79 passed |
18 todo (6 files). pnpm lint:core: 0/0.

Closes part of #324; third of three §1 todos wired to runnable ground.
ronan-dandelion-cult added a commit to karmaterminal/openclaw that referenced this pull request Apr 28, 2026
…el (#408)

* feat(swim-37): #324 §1 trap-class CHANGELOG-byte-grep discovery channel

Wires swim-37 trap-class §1 (parallel-evolution / cherry-false-negative)
discovery channel onto runnable ground. Pure function, byte-pinned to
studies/swim-37/traps/parallel-evolution-class.md §2 byte-walk against
v2026.4.24 CHANGELOG L164 (instance `7ee46a3ab9 fix: Add runner label
to /status (openclaw#70595)`).

API:
- grepChangelog(subject, content) — literal grep -F semantics, returns
  ALL hits (memo flags subject-line collisions as known false-positive
  mode; channel returns all so caller can decide)
- extractPrNumberToken(subject) — pulls trailing `(#N)` PR-token, last
  match wins (commits sometimes cite earlier PRs in body but subject's
  trailing PR is the landing PR)
- discoverChangelogHit(subject, content) — composite: PR-token first,
  fall through to full-subject; needle field tells caller which channel
  found the hit (or which was attempted last on miss)

Wires trap §1 it.todo:
  swim-runner.test.ts L81: "CHANGELOG-byte-grep discovery channel
  emits drop-with-reason span"
…onto runnable ground without depending on captureSwim() shim. The
span-emission piece is downstream (callsite needs to import this and
call emitContinuationDisabledSpan/equivalent with discovery.channel
attribute) but the discovery primitive is now testable in isolation.

Coverage (18 tests):
- subject-path: full subject, collision (multi-hit), empty subject
  (defense-in-depth: empty must NOT match-everything), whitespace,
  empty changelog, literal grep -F (no regex interpretation)
- PR-token extract: trailing #N, last-of-multiple, none, bare #N (must
  miss \u2014 only `(#N)` form), empty parens
- composite: load-bearing PR-channel hit on openclaw#70595, fallback to subject
  for harness-divergence instances, NO-hit for harness-divergence
  (memo says CHANGELOG silent for these)
- channel attribution: needle=PR-token vs needle=subject distinguishes
  channels for downstream span attrs

Boundary: pure string-in, structured-out. No git, no fs, no network.
Caller drives `git show <base>:CHANGELOG.md` and
`git log -1 --format='%s' <commit>`. Discipline matches the helper-
tier contract pattern from PR #406 (94fc8d1).

Result: 18/18 passing. swim-37 vitest project: 52 passed | 18 todo (4).
pnpm lint:core: 0/0.

Closes part of #324; wires §1 it.todo placeholder.

* feat(swim-37): #324 §1 cross-cut cherry-pick provenance parser

Adds the SECOND §1 discovery channel from the trap-class memo cross-cut
(L168 of studies/swim-37/traps/parallel-evolution-class.md).

Where CHANGELOG-byte-grep catches the rebase-source's WORK already in
base, cherry-pick-provenance catches the rebase-source's COMMIT already
in base — by parsing the `(cherry picked from commit <sha>)` footer
that git appends. The trap-class memo notes Instance 3 (`aa1908bf38`)
carries this footer pointing to `9dd097a7a5...`, which would have been
caught deterministically before conflict triage if the parser existed.

API:
- parseCherryPickProvenance(commitBody) — returns ALL footers in body
  order; multi-pick provenance accumulates when commits cross multiple
  branches; caller decides which is load-bearing
- lastCherryPickSourceSha(commitBody) — convenience for the common
  case (most recent pick in chain)

Boundary: pure parser. Caller drives `git show -s --format=%B` to
get body and `git merge-base --is-ancestor <src> <base>` to make the
ancestor decision.

Coverage (17 tests):
- byte-pinned aa1908b footer (40-hex)
- multi-footer in body order (cross-branch picks)
- trailing-period acceptance
- short-SHA acceptance (7+ hex, git's unique-floor)
- non-hex rejection, too-short rejection
- anchored-line discipline (sentence with phrase ≠ footer)
- leading-whitespace tolerance (indented footer blocks)
- empty body, no-footer body
- revert-footer NOT confused with pick-footer
- evidence preservation (line field for downstream span attrs)

Companion to chunk-1 of this PR (changelog-grep). Together they cover
the two positive-signal channels from the §1 memo. The conflict-content
classification rubric (third channel) is harder to cover as a pure
function — needs file-path + diff inspection — so it's left for a later
slice with the right shape.

Result: 17/17 passing this file. swim-37 vitest project: 69 passed |
18 todo (5 files). pnpm lint:core: 0/0.

Closes more of #324; second of two §1 discovery primitives.

* feat(swim-37): #324 §1 entry-point classifier (composes both discovery primitives)

Wires the §1 entry-point it.todo placeholder
  swim-runner.test.ts L80: "rebase bot classifies synthetic squash-rebased
  commit as DROP (not PICK)"
onto runnable ground by composing the two §1 discovery primitives
(changelog-grep + cherry-pick-provenance) into a single DROP/PICK/REVIEW
verdict with channel attribution and structured evidence for downstream
span emission.

API:
- classifyRebasePick({subject, commitBody, baseChangelog, isAncestorOf})
  returns {verdict: DROP|PICK|REVIEW, channel, evidence,
           needsConflictContentInspection?}

Channel ordering (memo-aligned, deterministic):
  1. cherry-pick-provenance (highest precision: exact source SHA,
     ancestor-of-base check unambiguous)
  2. changelog-grep:pr (high precision: PR tokens rarely collide)
  3. changelog-grep:subject (medium precision: substring, can collide)

Load-bearing invariant: when both positive-signal channels miss, the
verdict is REVIEW with needsConflictContentInspection=true \u2014 NOT a
silent default to PICK. The memo's third channel (conflict-content
classification rubric) needs file-path + diff inspection, not pure
string\u2192struct, so it's NOT covered here; classifier surfaces the gap
explicitly so the caller can hand off rather than misclassify the
test-harness divergence cases.

Evidence completeness: even when one channel is load-bearing for the
verdict, ALL channels' evidence is recorded in the result so the
downstream span emission can carry full audit data. Non-ancestor
cherry-pick footers go to a separate
slot for cross-branch provenance audit.

Boundary: pure function. Caller drives:
  - subject from `git log -1 --format='%s' <pick>`
  - body from `git show -s --format=%B <pick>`
  - base CHANGELOG from `git show <base>:CHANGELOG.md`
  - isAncestorOf wrapping `git merge-base --is-ancestor`

Coverage (10 tests):
- Instance 1 (openclaw#70595): DROP via changelog-grep:pr; no
  needsConflictContentInspection on DROP
- Instance 3 (cherry-pick): DROP via cherry-pick-provenance;
  precedence over CHANGELOG when both fire; non-ancestor footer
  doesn't trigger DROP via that channel
- Instance 2 (test-harness): REVIEW with
  needsConflictContentInspection=true; classifier never defaults to
  PICK on absence-of-signal
- subject-channel: DROP via changelog-grep:subject
- evidence completeness: both channels' evidence recorded when both
  fire; non-ancestor footers separated from load-bearing slot

Result: 10/10 passing this file. swim-37 vitest project: 79 passed |
18 todo (6 files). pnpm lint:core: 0/0.

Closes part of #324; third of three §1 todos wired to runnable ground.

---------

Co-authored-by: Ronan 🌊 <ronan@solidor.io>
elliott-dandelion-cult added a commit to karmaterminal/openclaw that referenced this pull request Apr 28, 2026
Now that #408 (PR #408 merged at cb73bc8) landed
`rebase-classifier.ts` with the pure `classifyRebasePick` composer,
the integration-tier \u00a71 entry-point it.todo on swim-runner.test.ts
L80 is wireable without any new shim.

Synthetic commit pinned: subject mentions PR openclaw#70595, base CHANGELOG
contains the same PR token (squash-rebase shape from the memo's
7ee46a3 anchor instance). Discovery channel: changelog-grep:pr.

L81 (CHANGELOG-byte-grep emits drop-with-reason span) stays todo
\u2014 needs the production callsite that pairs the channel with
`emitContinuationDisabledSpan`, per \ud83c\udf0a's #408 note.

Test delta:
  Before: 20 passed | 16 todo (36)
  After: 21 passed | 15 todo (36)
  +1 live | -1 todo
elliott-dandelion-cult added a commit to karmaterminal/openclaw that referenced this pull request Apr 28, 2026
…405)

* test(swim-37): wire captureSwim() for continue_work primitive

Replaces the placeholder `declare function captureSwim` in
`studies/swim-37/harness/swim-runner.test.ts` with a real implementation
in `studies/swim-37/harness/swim-runner.ts`. The wired path drives
`emitContinuationWorkSpan` against `createInMemorySpanRecorder()` and
returns the captured spans + the synthesized `chainId` (uuid v7 via
`generateChainId`).

Flips two prior `it.todo` markers to live tests:
  - emits continuation.work span with chain.id stamped (#366)
  - span carries chain.step.remaining attribute

Plus three new live tests:
  - sentinel: captureSwim() is wired for continue_work
  - refuses continue_delegate / heartbeat / lich primitives with a clear
    error so the spec's remaining `it.todo` markers stay honest about
    what is implemented
  - repeated calls don't leak capture state between invocations

STDOUT-only discipline preserved: no BasicTracerProvider, no OTLP
exporter, no @opentelemetry/sdk-trace-base machinery — capture flows
through `setContinuationTracer(recorder.tracer)` as documented in
README.md.

Other primitives (`continue_delegate`, `heartbeat`, lich-shape) remain
`it.todo` until the corresponding dispatch / heartbeat /
compaction-release seams have a comparable single-helper entry point we
can drive synthetically.

Test results:
  - swim-37 vitest project: 20 passed, 16 todo, 0 failed
  - in-memory-span-recorder.test.ts unchanged (12/12 still pass)
  - tsgo errors visible elsewhere in tree are pre-existing
    (src/plugin-sdk/provider-tools.ts), not introduced by this change

Refs #324 (swim-37 harness) — `it.todo` count drops from 17 \u2192 16,
first concrete primitive driven through the live tracer registry.

* docs(swim-37): continue_delegate wiring memo (memo-before-wire)

Pre-PR design memo for the next swim-37 primitive after #405
(continue_work). Decides shape before the wire so the PR lands clean.

Resolved by reading production code:
  Q1 — real setTimeout vs fake timers
       N/A for dispatch-accept (synchronous, pre-timer); banked for
       a separate continue_delegate_fire swim.

Cohort sign-off requested on:
  Q2 — recipient fan-out shape: N spans sharing chain.id (matches
       chunk-3 cohort design pin + #355 Stage-2 budget semantics).
  Q3 — delegate.mode x delegate.delivery: 8-cell it.each matrix
       + omission-contract row.

Banked for follow-up:
  Q4 — continue_delegate_fire as separate primitive, not sub-case;
       informs SwimPrimitive type-union shape.

Standard applied (per Cael 2026-04-27 #1498505918, figs
#1498505870): would skipping this memo cost a chunk's worth of
rework? Yes — three open Qs would have changed file shape.

Refs:
  - #405 (continue_work wired)
  - #324 (swim-37 harness)
  - docs/design/334-slice2-chunk5b-delegate-fire-memo.md (pattern)

* test(swim-37): wire captureSwim() for continue_delegate primitive

Implements the design from
`docs/design/swim-37-continue-delegate-wiring-memo.md` (commit
3bb086c762, same branch).

Wiring (swim-runner.ts):
- New continue_delegate case in the switch.
- Adds CaptureSwimOptions axes: recipients, delivery, delegateMode.
- Drives emitContinuationDelegateSpan once per recipient with shared
  chain.id (per chunk-3 cohort design pin: N recipients = N spans
  sharing chain.id, NOT one span with a recipients list).
- Validates recipients is a positive integer.

Tests (swim-runner.test.ts):
- 8-cell it.each matrix: delegate.mode (normal | silent | silent-wake
  | post-compaction) x delegate.delivery (immediate | timer).
- Omission contract: delegate.mode attribute absent when caller
  passes undefined.
- Fan-out: 3 recipients emit 3 spans with shared chain.id.
- Validation: recipients=0 and recipients=1.5 reject.
- Removes the prior 'refuses continue_delegate' test row (now wired).

Two it.todo markers preserved for downstream concerns:
- chain-budget decrement / ChainBudget.declineToCarry observable
  (needs ChainBudget integration, not in scope here).
- fan-out budget arithmetic per #355 Stage-2 (1 step not N) \u2014
  separate from span cardinality.

Test results delta:
  Before: 8 passed | 16 todo (24 tests)
  After: 19 passed | 15 todo (34 tests)
  +11 live | -1 todo

Refs:
- #324 swim-37 harness
- #405 captureSwim continue_work (parent of this branch)
- docs/design/swim-37-continue-delegate-wiring-memo.md

* test(swim-37): pin recipient.index gap on continue_delegate fan-out

Per Cael's caution on the wiring memo (Discord msg
1498507232185286849 sign-off): N spans sharing chain.id risks
collapsing into analytic mush at scale unless per-recipient
distinction is visible in attrs. Currently
emitContinuationDelegateSpan exposes no recipient.index axis \u2014
all N spans in a fan-out are byte-identical except spanId.

Adds:
- One live GAP-PIN test asserting the current (deficient) reality:
  fan-out spans toEqual on attributes. This will fail loudly when
  the production helper grows the axis, prompting a flip to
  not.toEqual.
- One it.todo marker for the upcoming production-helper change.

Production-helper change is out of scope for this PR; will file
follow-up issue. The gap-pin lives in the integration tier so it
shows up in any code-review touching delegate-dispatch span shape.

Refs:
- #324 swim-37 harness
- #405 captureSwim continue_work + continue_delegate (this PR)
- docs/design/swim-37-continue-delegate-wiring-memo.md (Q2)
- Cael Discord sign-off msg 1498507232185286849

* test(swim-37): wire \u00a71 entry-point todo onto classifyRebasePick

Now that #408 (PR #408 merged at cb73bc8) landed
`rebase-classifier.ts` with the pure `classifyRebasePick` composer,
the integration-tier \u00a71 entry-point it.todo on swim-runner.test.ts
L80 is wireable without any new shim.

Synthetic commit pinned: subject mentions PR openclaw#70595, base CHANGELOG
contains the same PR token (squash-rebase shape from the memo's
7ee46a3 anchor instance). Discovery channel: changelog-grep:pr.

L81 (CHANGELOG-byte-grep emits drop-with-reason span) stays todo
\u2014 needs the production callsite that pairs the channel with
`emitContinuationDisabledSpan`, per \ud83c\udf0a's #408 note.

Test delta:
  Before: 20 passed | 16 todo (36)
  After: 21 passed | 15 todo (36)
  +1 live | -1 todo
karmafeast pushed a commit to karmaterminal/openclaw that referenced this pull request May 1, 2026
…el (#408)

* feat(swim-37): #324 §1 trap-class CHANGELOG-byte-grep discovery channel

Wires swim-37 trap-class §1 (parallel-evolution / cherry-false-negative)
discovery channel onto runnable ground. Pure function, byte-pinned to
studies/swim-37/traps/parallel-evolution-class.md §2 byte-walk against
v2026.4.24 CHANGELOG L164 (instance `7ee46a3ab9 fix: Add runner label
to /status (openclaw#70595)`).

API:
- grepChangelog(subject, content) — literal grep -F semantics, returns
  ALL hits (memo flags subject-line collisions as known false-positive
  mode; channel returns all so caller can decide)
- extractPrNumberToken(subject) — pulls trailing `(#N)` PR-token, last
  match wins (commits sometimes cite earlier PRs in body but subject's
  trailing PR is the landing PR)
- discoverChangelogHit(subject, content) — composite: PR-token first,
  fall through to full-subject; needle field tells caller which channel
  found the hit (or which was attempted last on miss)

Wires trap §1 it.todo:
  swim-runner.test.ts L81: "CHANGELOG-byte-grep discovery channel
  emits drop-with-reason span"
…onto runnable ground without depending on captureSwim() shim. The
span-emission piece is downstream (callsite needs to import this and
call emitContinuationDisabledSpan/equivalent with discovery.channel
attribute) but the discovery primitive is now testable in isolation.

Coverage (18 tests):
- subject-path: full subject, collision (multi-hit), empty subject
  (defense-in-depth: empty must NOT match-everything), whitespace,
  empty changelog, literal grep -F (no regex interpretation)
- PR-token extract: trailing #N, last-of-multiple, none, bare #N (must
  miss \u2014 only `(#N)` form), empty parens
- composite: load-bearing PR-channel hit on openclaw#70595, fallback to subject
  for harness-divergence instances, NO-hit for harness-divergence
  (memo says CHANGELOG silent for these)
- channel attribution: needle=PR-token vs needle=subject distinguishes
  channels for downstream span attrs

Boundary: pure string-in, structured-out. No git, no fs, no network.
Caller drives `git show <base>:CHANGELOG.md` and
`git log -1 --format='%s' <commit>`. Discipline matches the helper-
tier contract pattern from PR #406 (94fc8d1).

Result: 18/18 passing. swim-37 vitest project: 52 passed | 18 todo (4).
pnpm lint:core: 0/0.

Closes part of #324; wires §1 it.todo placeholder.

* feat(swim-37): #324 §1 cross-cut cherry-pick provenance parser

Adds the SECOND §1 discovery channel from the trap-class memo cross-cut
(L168 of studies/swim-37/traps/parallel-evolution-class.md).

Where CHANGELOG-byte-grep catches the rebase-source's WORK already in
base, cherry-pick-provenance catches the rebase-source's COMMIT already
in base — by parsing the `(cherry picked from commit <sha>)` footer
that git appends. The trap-class memo notes Instance 3 (`aa1908bf38`)
carries this footer pointing to `9dd097a7a5...`, which would have been
caught deterministically before conflict triage if the parser existed.

API:
- parseCherryPickProvenance(commitBody) — returns ALL footers in body
  order; multi-pick provenance accumulates when commits cross multiple
  branches; caller decides which is load-bearing
- lastCherryPickSourceSha(commitBody) — convenience for the common
  case (most recent pick in chain)

Boundary: pure parser. Caller drives `git show -s --format=%B` to
get body and `git merge-base --is-ancestor <src> <base>` to make the
ancestor decision.

Coverage (17 tests):
- byte-pinned aa1908b footer (40-hex)
- multi-footer in body order (cross-branch picks)
- trailing-period acceptance
- short-SHA acceptance (7+ hex, git's unique-floor)
- non-hex rejection, too-short rejection
- anchored-line discipline (sentence with phrase ≠ footer)
- leading-whitespace tolerance (indented footer blocks)
- empty body, no-footer body
- revert-footer NOT confused with pick-footer
- evidence preservation (line field for downstream span attrs)

Companion to chunk-1 of this PR (changelog-grep). Together they cover
the two positive-signal channels from the §1 memo. The conflict-content
classification rubric (third channel) is harder to cover as a pure
function — needs file-path + diff inspection — so it's left for a later
slice with the right shape.

Result: 17/17 passing this file. swim-37 vitest project: 69 passed |
18 todo (5 files). pnpm lint:core: 0/0.

Closes more of #324; second of two §1 discovery primitives.

* feat(swim-37): #324 §1 entry-point classifier (composes both discovery primitives)

Wires the §1 entry-point it.todo placeholder
  swim-runner.test.ts L80: "rebase bot classifies synthetic squash-rebased
  commit as DROP (not PICK)"
onto runnable ground by composing the two §1 discovery primitives
(changelog-grep + cherry-pick-provenance) into a single DROP/PICK/REVIEW
verdict with channel attribution and structured evidence for downstream
span emission.

API:
- classifyRebasePick({subject, commitBody, baseChangelog, isAncestorOf})
  returns {verdict: DROP|PICK|REVIEW, channel, evidence,
           needsConflictContentInspection?}

Channel ordering (memo-aligned, deterministic):
  1. cherry-pick-provenance (highest precision: exact source SHA,
     ancestor-of-base check unambiguous)
  2. changelog-grep:pr (high precision: PR tokens rarely collide)
  3. changelog-grep:subject (medium precision: substring, can collide)

Load-bearing invariant: when both positive-signal channels miss, the
verdict is REVIEW with needsConflictContentInspection=true \u2014 NOT a
silent default to PICK. The memo's third channel (conflict-content
classification rubric) needs file-path + diff inspection, not pure
string\u2192struct, so it's NOT covered here; classifier surfaces the gap
explicitly so the caller can hand off rather than misclassify the
test-harness divergence cases.

Evidence completeness: even when one channel is load-bearing for the
verdict, ALL channels' evidence is recorded in the result so the
downstream span emission can carry full audit data. Non-ancestor
cherry-pick footers go to a separate
slot for cross-branch provenance audit.

Boundary: pure function. Caller drives:
  - subject from `git log -1 --format='%s' <pick>`
  - body from `git show -s --format=%B <pick>`
  - base CHANGELOG from `git show <base>:CHANGELOG.md`
  - isAncestorOf wrapping `git merge-base --is-ancestor`

Coverage (10 tests):
- Instance 1 (openclaw#70595): DROP via changelog-grep:pr; no
  needsConflictContentInspection on DROP
- Instance 3 (cherry-pick): DROP via cherry-pick-provenance;
  precedence over CHANGELOG when both fire; non-ancestor footer
  doesn't trigger DROP via that channel
- Instance 2 (test-harness): REVIEW with
  needsConflictContentInspection=true; classifier never defaults to
  PICK on absence-of-signal
- subject-channel: DROP via changelog-grep:subject
- evidence completeness: both channels' evidence recorded when both
  fire; non-ancestor footers separated from load-bearing slot

Result: 10/10 passing this file. swim-37 vitest project: 79 passed |
18 todo (6 files). pnpm lint:core: 0/0.

Closes part of #324; third of three §1 todos wired to runnable ground.

---------

Co-authored-by: Ronan 🌊 <ronan@solidor.io>
karmafeast pushed a commit to karmaterminal/openclaw that referenced this pull request May 1, 2026
…405)

* test(swim-37): wire captureSwim() for continue_work primitive

Replaces the placeholder `declare function captureSwim` in
`studies/swim-37/harness/swim-runner.test.ts` with a real implementation
in `studies/swim-37/harness/swim-runner.ts`. The wired path drives
`emitContinuationWorkSpan` against `createInMemorySpanRecorder()` and
returns the captured spans + the synthesized `chainId` (uuid v7 via
`generateChainId`).

Flips two prior `it.todo` markers to live tests:
  - emits continuation.work span with chain.id stamped (#366)
  - span carries chain.step.remaining attribute

Plus three new live tests:
  - sentinel: captureSwim() is wired for continue_work
  - refuses continue_delegate / heartbeat / lich primitives with a clear
    error so the spec's remaining `it.todo` markers stay honest about
    what is implemented
  - repeated calls don't leak capture state between invocations

STDOUT-only discipline preserved: no BasicTracerProvider, no OTLP
exporter, no @opentelemetry/sdk-trace-base machinery — capture flows
through `setContinuationTracer(recorder.tracer)` as documented in
README.md.

Other primitives (`continue_delegate`, `heartbeat`, lich-shape) remain
`it.todo` until the corresponding dispatch / heartbeat /
compaction-release seams have a comparable single-helper entry point we
can drive synthetically.

Test results:
  - swim-37 vitest project: 20 passed, 16 todo, 0 failed
  - in-memory-span-recorder.test.ts unchanged (12/12 still pass)
  - tsgo errors visible elsewhere in tree are pre-existing
    (src/plugin-sdk/provider-tools.ts), not introduced by this change

Refs #324 (swim-37 harness) — `it.todo` count drops from 17 \u2192 16,
first concrete primitive driven through the live tracer registry.

* docs(swim-37): continue_delegate wiring memo (memo-before-wire)

Pre-PR design memo for the next swim-37 primitive after #405
(continue_work). Decides shape before the wire so the PR lands clean.

Resolved by reading production code:
  Q1 — real setTimeout vs fake timers
       N/A for dispatch-accept (synchronous, pre-timer); banked for
       a separate continue_delegate_fire swim.

Cohort sign-off requested on:
  Q2 — recipient fan-out shape: N spans sharing chain.id (matches
       chunk-3 cohort design pin + #355 Stage-2 budget semantics).
  Q3 — delegate.mode x delegate.delivery: 8-cell it.each matrix
       + omission-contract row.

Banked for follow-up:
  Q4 — continue_delegate_fire as separate primitive, not sub-case;
       informs SwimPrimitive type-union shape.

Standard applied (per Cael 2026-04-27 #1498505918, figs
#1498505870): would skipping this memo cost a chunk's worth of
rework? Yes — three open Qs would have changed file shape.

Refs:
  - #405 (continue_work wired)
  - #324 (swim-37 harness)
  - docs/design/334-slice2-chunk5b-delegate-fire-memo.md (pattern)

* test(swim-37): wire captureSwim() for continue_delegate primitive

Implements the design from
`docs/design/swim-37-continue-delegate-wiring-memo.md` (commit
3bb086c762, same branch).

Wiring (swim-runner.ts):
- New continue_delegate case in the switch.
- Adds CaptureSwimOptions axes: recipients, delivery, delegateMode.
- Drives emitContinuationDelegateSpan once per recipient with shared
  chain.id (per chunk-3 cohort design pin: N recipients = N spans
  sharing chain.id, NOT one span with a recipients list).
- Validates recipients is a positive integer.

Tests (swim-runner.test.ts):
- 8-cell it.each matrix: delegate.mode (normal | silent | silent-wake
  | post-compaction) x delegate.delivery (immediate | timer).
- Omission contract: delegate.mode attribute absent when caller
  passes undefined.
- Fan-out: 3 recipients emit 3 spans with shared chain.id.
- Validation: recipients=0 and recipients=1.5 reject.
- Removes the prior 'refuses continue_delegate' test row (now wired).

Two it.todo markers preserved for downstream concerns:
- chain-budget decrement / ChainBudget.declineToCarry observable
  (needs ChainBudget integration, not in scope here).
- fan-out budget arithmetic per #355 Stage-2 (1 step not N) \u2014
  separate from span cardinality.

Test results delta:
  Before: 8 passed | 16 todo (24 tests)
  After: 19 passed | 15 todo (34 tests)
  +11 live | -1 todo

Refs:
- #324 swim-37 harness
- #405 captureSwim continue_work (parent of this branch)
- docs/design/swim-37-continue-delegate-wiring-memo.md

* test(swim-37): pin recipient.index gap on continue_delegate fan-out

Per Cael's caution on the wiring memo (Discord msg
1498507232185286849 sign-off): N spans sharing chain.id risks
collapsing into analytic mush at scale unless per-recipient
distinction is visible in attrs. Currently
emitContinuationDelegateSpan exposes no recipient.index axis \u2014
all N spans in a fan-out are byte-identical except spanId.

Adds:
- One live GAP-PIN test asserting the current (deficient) reality:
  fan-out spans toEqual on attributes. This will fail loudly when
  the production helper grows the axis, prompting a flip to
  not.toEqual.
- One it.todo marker for the upcoming production-helper change.

Production-helper change is out of scope for this PR; will file
follow-up issue. The gap-pin lives in the integration tier so it
shows up in any code-review touching delegate-dispatch span shape.

Refs:
- #324 swim-37 harness
- #405 captureSwim continue_work + continue_delegate (this PR)
- docs/design/swim-37-continue-delegate-wiring-memo.md (Q2)
- Cael Discord sign-off msg 1498507232185286849

* test(swim-37): wire \u00a71 entry-point todo onto classifyRebasePick

Now that #408 (PR #408 merged at cb73bc8) landed
`rebase-classifier.ts` with the pure `classifyRebasePick` composer,
the integration-tier \u00a71 entry-point it.todo on swim-runner.test.ts
L80 is wireable without any new shim.

Synthetic commit pinned: subject mentions PR openclaw#70595, base CHANGELOG
contains the same PR token (squash-rebase shape from the memo's
7ee46a3 anchor instance). Discovery channel: changelog-grep:pr.

L81 (CHANGELOG-byte-grep emits drop-with-reason span) stays todo
\u2014 needs the production callsite that pairs the channel with
`emitContinuationDisabledSpan`, per \ud83c\udf0a's #408 note.

Test delta:
  Before: 20 passed | 16 todo (36)
  After: 21 passed | 15 todo (36)
  +1 live | -1 todo
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* Add runner label to status output

* Add changelog entry for status runner label

* Fix status runner detection and sanitization
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* Add runner label to status output

* Add changelog entry for status runner label

* Fix status runner detection and sanitization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant