Skip to content

test(otel): pin delegate.continuation span uniformity across silent / silent-wake / post-compaction modes#511

Merged
cael-dandelion-cult merged 1 commit intocael/325-canonical2from
cael/otel-span-uniformity-test
May 2, 2026
Merged

test(otel): pin delegate.continuation span uniformity across silent / silent-wake / post-compaction modes#511
cael-dandelion-cult merged 1 commit intocael/325-canonical2from
cael/otel-span-uniformity-test

Conversation

@cael-dandelion-cult
Copy link
Copy Markdown

Summary

Adds a runner integration trap for delegate-dispatch span uniformity across continuation delegate modes. The current tracer contract emits continuation.delegate.dispatch; the test covers the shipped silent and silent-wake tool-delegate paths and pins post-compaction as the remaining production gap with it.todo.

Gap-list reference: 🌊 Discord message 1499912840453296188 from the workorder (channel URL was not available in this worktree).

Per-mode result

Mode Result Evidence
silent PASS Exactly one continuation.delegate.dispatch span; delegate.mode=silent; chain.id preserves the existing parent chain id; uniform key set.
silent-wake PASS Exactly one continuation.delegate.dispatch span; delegate.mode=silent-wake; chain.id preserves the existing parent chain id; uniform key set.
post-compaction TODO Production gap: post-compaction delegate delivery persists continuation chain state but does not emit continuation.delegate.dispatch with delegate.mode=post-compaction yet.

Sabotage walk

To fail this trap, remove or rename the runner-side emitContinuationDelegateSpan call for accepted tool delegates, drop chain.id, delegate.mode, delegate.delivery, or chain.step.remaining from the emitted attributes, mint a fresh chain id instead of preserving an existing parent chain id, or add the post-compaction production emitter without replacing the TODO with a passing assertion.

Gates

Command Exit Notes
pnpm tsgo 0 Passed.
pnpm check 0 Passed after test-only lint cleanup in existing continuation-mode tests.
pnpm test src/auto-reply/reply/agent-runner.continuation-span-uniformity.test.ts 0 1 passed, 1 TODO.
pnpm test 0 Final full-suite rerun passed: 412 files, 4551 tests, 4 skipped. Earlier full reruns exposed unrelated flaky shards that both passed in isolation.
pnpm build 0 Passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@cael-dandelion-cult cael-dandelion-cult merged commit 74940e5 into cael/325-canonical2 May 2, 2026
89 of 94 checks passed
@cael-dandelion-cult cael-dandelion-cult deleted the cael/otel-span-uniformity-test branch May 2, 2026 00:11
cael-dandelion-cult pushed a commit that referenced this pull request May 2, 2026
Three test files from merged PRs (#462, #468, #511) were absent because
this branch forked from canonical2 before those PRs landed. The post-revert
allow-list audit (§3.4) flagged them as deletions from landed PRs.
Restored from canonical2 HEAD (74940e5).

- types.mode-shape.test.ts (#462)
- agent-runner.continuation-span-uniformity.test.ts (#511)
- store.continuation-merge.test.ts (#468)

tmp-drop-me-otel-span-uniformity.md omitted (copilot scratch; safe to drop).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cael-dandelion-cult added a commit that referenced this pull request May 2, 2026
…6.4.24) (#515)

* wo(canonical2-rebase-pathB): rebase Path-B's 5 cleanup commits onto canonical2 (figs directive 22:55Z)

* chore(v3-cleanup): wave A cohort-identity scrub

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(v3-cleanup): drop rejected rebase artifacts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: scrub workspace template wording

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(v3-cleanup): wave B structural dedup of continuation runtime

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: journal canonical2 wave B

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(v3-cleanup): wave C import discipline and build warnings

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: journal canonical2 wave C

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(v3-cleanup): wave D surface continuation failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: surface compaction count reconcile failures

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(v3-cleanup): wave E continuation coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: journal canonical2 wave E

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: align bundled plugin dependency types

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test: isolate bedrock app profile runtime deps

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: scrub fork process labels from source comments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: close continuation type design blockers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: scrub continuation prompt process link

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: journal canonical2 final checkpoint

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert "chore(v3-cleanup): drop rejected rebase artifacts"

This reverts commit 3396b88.

The original commit mass-deleted 30 files (6745 deletions) under the label
"rejected rebase artifacts." ~5141 of those deletions are landed swim-37
durability harness substrate from merged PRs #412/#413/#414/#416/#417/#418/#419
plus collateral docs/scripts. These are not rejected artifacts — they are
committed, merged test infrastructure that proves continuation durability
across compaction.

Cohort review (🩸 + 🌊 + 🌻 + 🌫) confirmed the block finding at
PR #515 issuecomment-4362337067.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: release-note for context-pressure band-derivation behavior change

Wave B (cefa09d) changed context-pressure bands from fixed
[25, 80, 90, 95] to threshold-derived [thresholdPct, 90, 95].
At default 0.8 the implicit 25% early-warning band is removed.
Ship-acceptable per cohort review; release-note documents the change
and points to #516 for the earlyWarningBand config opt follow-up.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: restore landed-PR tests missing from rebase fork-point

Three test files from merged PRs (#462, #468, #511) were absent because
this branch forked from canonical2 before those PRs landed. The post-revert
allow-list audit (§3.4) flagged them as deletions from landed PRs.
Restored from canonical2 HEAD (74940e5).

- types.mode-shape.test.ts (#462)
- agent-runner.continuation-span-uniformity.test.ts (#511)
- store.continuation-merge.test.ts (#468)

tmp-drop-me-otel-span-uniformity.md omitted (copilot scratch; safe to drop).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: add rebase.classify to ContinuationSpanName for restored tracer

The revert of 3396b88 restored src/rebase/tracer.ts which emits
"rebase.classify" spans. Commit 4871c81 (fix: close continuation
type design blockers) narrowed startSpan from string to
ContinuationSpanName after tracer.ts was deleted — additive fix to
include the span name in the union.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(continuation): add earlyWarningBand config opt for post-compaction cycle primer

* test(continuation): pin earlyWarningBand default-preservation + opt-out branches

* fix(continuation): add curly braces to satisfy linter

* fix(continuation): unblock early-warning band fire path + make field optional

Three bugs caught in cohort review of v5 (3e88ce5):

1. Suppression guard bug (Silas): non-postCompaction call sites bailed
   with 'ratio < threshold' BEFORE the resolved early-warn band could
   fire. Even with earlyWarningBand explicitly set, ratio=0.25 +
   threshold=0.8 resolved band=25 then was discarded. Guard now
   suppresses only when 'band === 0 && ratio < threshold' — preserves
   the round-to-band-0 dedup edge case while letting early-warn fire.

2. Type-required regression (Elliott): ContinuationRuntimeConfig had
   'earlyWarningBand: number' (required), breaking 3 test fixtures
   (config.test, scheduler.test, post-compaction-delegate-dispatch.test)
   with TS2741. Field already optional at zod + resolver-default site;
   making the type optional matches.

3. Schema baseline regen (Elliott): src/config/schema.base.generated.ts
   needed regen to absorb the new earlyWarningBand field; preexisting
   models.providers.*.request.tls.insecureSkipVerify drift also
   absorbed in the same regen.

Tests added:
- checkContextPressure 'fires early-warning band below threshold when
  earlyWarningBand is set' (default-preservation path)
- checkContextPressure 'does NOT fire below threshold when
  earlyWarningBand is 0' (opt-out path)

All 107 affected tests pass: context-pressure (19), config (9),
scheduler (12), schema.base.generated (10), post-compaction-delegate-
dispatch (23), reply/context-pressure (34).

Cohort cosign chain: 🩸 (root catch v5), 🌊 (default=0 catch),
🌫 (suppression-guard catch), 🌻 (type-required + baseline catch).

Refs #515

---------

Co-authored-by: frond-scribe <frond-scribe@karmaterminal>
Co-authored-by: Test User <test@example.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: dandelion cult - cael 🩸 <cael@dandelion.cult>
Co-authored-by: dandelion cult - silas 🌫 <silas.dandelion.cult@hotmail.com>
ronan-dandelion-cult pushed a commit that referenced this pull request May 3, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ronan-dandelion-cult pushed a commit that referenced this pull request May 4, 2026
… test

The lane's switch from resolveContinuationRuntimeConfig to
resolveLiveContinuationRuntimeConfig in agent-runner.ts makes the
runner read continuation knobs from getRuntimeConfigSnapshot() with
fallback to the captured cfg. In production the snapshot is always set
to the full live config, so live-read returns the right values. In
tests, getRuntimeConfig() side-effects setRuntimeConfigSnapshot({})
via loadPinnedRuntimeConfig when no real config file exists, so the
snapshot becomes an empty object before the runner's enforcement
points run -- the (snap ?? fallback) check sees a non-null {} and
returns continuation defaults instead of the test's captured cfg.

agent-runner.continuation-span-uniformity.test.ts was added by #511
AFTER #536's branch was based, so it lacked the
setRuntimeConfigSnapshot(run.followupRun.run.config) /
clearRuntimeConfigSnapshot() lifecycle hooks the other span tests
already carry. Adding them surfaces the test's continuation cfg
(maxChainLength=6) to the live-read path, restoring the chain.step.remaining
math the test expects (6-(currentChainCount+1) = 4 for silent / 3
for silent-wake).

CI on PR #576 caught this. Same one-test pattern as
agent-runner.continuation-delegate-fire-span.test.ts and
agent-runner.continuation-work-span.test.ts.
ronan-dandelion-cult added a commit that referenced this pull request May 4, 2026
… knobs, plan-aware CLI hint (supersedes #536) (#576)

* fix(config): hot-read tools.sessions.visibility for production session tools

Adds optional getConfig accessor to session-tool factory option types
(sessions-list, sessions-history, sessions-send, session-status,
sessions-helpers::resolveSessionToolContext) so each execution can read
the active runtime config rather than the construction-time snapshot.

createOpenClawTools gains a liveSessionToolConfig opt-in that routes
session-tool factories through getRuntimeConfig. The production wiring
paths (pi-tools, gateway tool-resolution, inline-action skill dispatch)
opt in. Other call sites keep the construction-time config to avoid
churning shared snapshot assumptions.

This makes openclaw config set tools.sessions.visibility=all take effect
without rebuilding tool instances. Sessions-visibility re-read is a
stateless decision at execution time, so no in-flight state is
invalidated -- preserving the RFC §6.5 hot-reload integrity invariant.

Closes #533.

Refs RFC docs/design/continue-work-signal-v2.md §6.5.

* fix(continuation): resolve runtime knobs from active runtime snapshot

Adds resolveLiveContinuationRuntimeConfig(fallbackCfg) that prefers
getRuntimeConfigSnapshot() over the captured cfg snapshot. Switches the
six per-turn enforcement points in agent-runner.ts and followup-runner.ts
from resolveContinuationRuntimeConfig to the live variant:

  - pre-run context-pressure check (agent-runner.ts:1436) — reads
    contextPressureThreshold and earlyWarningBand at next pressure check
  - bracket continuation scheduler (agent-runner.ts:2115) — reads chain
    cap, cost cap, and delay window at schedule time
  - tool-delegate dispatch (agent-runner.ts:2526) — reads chain cap, cost
    cap, delay window, and per-turn cap at dispatch time
  - hedge timer arm (agent-runner.ts:2889) — reads chain cap at arm time
  - followup-path tool-delegate dispatch (followup-runner.ts:477) —
    reads chain cap at dispatch time

Each call site is at a decision-point or schedule-time read, so already-
armed timers, queued retries, and staged post-compaction handoffs keep
the values they captured at arm time. New schedules use new values. This
preserves the RFC §6.5 in-flight-state integrity invariant while letting
gateway/reload config-change events take effect at the next decision.

Span tests (agent-runner.continuation-*-span.test.ts,
agent-runner.misc.runreplyagent.test.ts) carry
setRuntimeConfigSnapshot/clearRuntimeConfigSnapshot lifecycle hooks so
the new resolution path is exercised under test.

Closes #19.

Refs RFC docs/design/continue-work-signal-v2.md §6.5.

* fix(cli): plan-aware reload hint for openclaw config set/patch/unset

The Restart the gateway to apply. message is now gated through
buildGatewayReloadPlan so the user sees the truth: dynamic-read paths
print No gateway restart required., hot-reload paths print Gateway hot
reload will apply., and the original restart hint only fires when the
planner actually requires a restart. Multi-path patches that touch a
mix of dynamic and hot paths print the combined hint.

Adds two regression pins in src/gateway/config-reload.test.ts so the
dynamic-read disposition for tools.sessions.visibility and
agents.defaults.continuation.maxDelegatesPerTurn does not silently flip
back to restart-required.

Closes #531.

* test(continuation): seed runtime snapshot in delegate span-uniformity test

The lane's switch from resolveContinuationRuntimeConfig to
resolveLiveContinuationRuntimeConfig in agent-runner.ts makes the
runner read continuation knobs from getRuntimeConfigSnapshot() with
fallback to the captured cfg. In production the snapshot is always set
to the full live config, so live-read returns the right values. In
tests, getRuntimeConfig() side-effects setRuntimeConfigSnapshot({})
via loadPinnedRuntimeConfig when no real config file exists, so the
snapshot becomes an empty object before the runner's enforcement
points run -- the (snap ?? fallback) check sees a non-null {} and
returns continuation defaults instead of the test's captured cfg.

agent-runner.continuation-span-uniformity.test.ts was added by #511
AFTER #536's branch was based, so it lacked the
setRuntimeConfigSnapshot(run.followupRun.run.config) /
clearRuntimeConfigSnapshot() lifecycle hooks the other span tests
already carry. Adding them surfaces the test's continuation cfg
(maxChainLength=6) to the live-read path, restoring the chain.step.remaining
math the test expects (6-(currentChainCount+1) = 4 for silent / 3
for silent-wake).

CI on PR #576 caught this. Same one-test pattern as
agent-runner.continuation-delegate-fire-span.test.ts and
agent-runner.continuation-work-span.test.ts.

---------

Co-authored-by: frond-scribe <frond-scribe@karmaterminal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant