docs: RFC observability section + lifecycle event traces 🌻 by karmafeast · Pull Request #15 · karmaterminal/openclaw

karmafeast · 2026-03-07T08:45:33Z

RFC polish — Elliott's section. Adds Operator Observability section + lifecycle event trace additions. Scrubbed per upstream guidelines.

Hot-reloaded config now takes effect without tool recreation or gateway restart. Fallback chain: live config → construction-time opts → hardcoded 5. 4 new tests verify hot-reload, fallback, and hardcoded default. Signed-off-by: Silas 🌫️

- WORK timers use strict equality (intentional: stop and listen) - DELEGATE timers use tolerance-aware comparison (survive chatter) - maxDelegatesPerTurn hot-reload behavior documented Signed-off-by: Silas 🌫️

…time used 5 Type comment at types.agent-defaults.ts:296 updated to match runtime hardcoded default of 5. Swim 6-7b validated at 5. Signed-off-by: Silas 🌫️

…re time Extracts generation counter + tolerance logic into dedicated module (continuation-generation.ts). Timer callbacks now read generationGuardTolerance via loadConfig() at fire time instead of capturing it in a closure at schedule time. Three delegate timer sites fixed: - agent-runner.ts:1229 (bracket-path delegate) - agent-runner.ts:1382 (tool-path delegate) - subagent-announce.ts:1480 (chain-hop) WORK timers (agent-runner.ts:1249) intentionally keep strict equality (no tolerance) — self-continuation should stop on any external input. DELEGATE timers honor generationGuardTolerance for noisy channels. Default tolerance: 0 (strict). Fleet operators set 300 via config. Cancellation paths use invalidateContinuationGeneration() which jumps past the tolerance window, ensuring all in-flight timers are voided. Co-authored-by: Cael <cael@karmaterminal.org>

…extraction

…rity New module provides resolveContinuationRuntimeConfig() which reads live config via loadConfig() and normalizes all continuation defaults in one place. Eliminates scattered ?? defaults across agent-runner.ts and subagent-announce.ts. Canonical defaults (single source of truth): - defaultDelayMs: 15000, minDelayMs: 5000, maxDelayMs: 300000 - maxChainLength: 10, costCapTokens: 500000 - maxDelegatesPerTurn: 5, generationGuardTolerance: 0 Also exports resolveMaxDelegatesPerTurn() convenience for tool execution and agent-runner consumption paths. continuation-generation.ts now imports tolerance from this module instead of duplicating loadConfig() + validation logic. Co-authored-by: Cael <cael@karmaterminal.org>

…lization authority

- 6 tests for maxChainLength enforcement via task-prefix [continuation:chain-hop:N] - Boundary enforcement (hop 9 → 10 >= 10 → block) - Cost cap blocking when chain length within bounds - Custom maxChainLength from config - First bracket-started hop (no prefix → hop 0) - FINDINGS.md: P0 status corrected (committed at 5118e7a, not 'locally applied') - FINDINGS.md: P1/P2 status updated (prince branches in progress) - FINDINGS.md: WORK timer tolerance asymmetry documented - FINDINGS.md: Path-dependency gap (bracket-started chains) documented (Codex 5.4 finding #1) Test: OPENCLAW_TEST_FAST=1 npx vitest run src/agents/subagent-announce.chain-guard.test.ts Result: 6/6 pass

…DINGS.md update

…n time Workstream D secondary layer: agent-runner consumption path trims queued delegates exceeding live resolveMaxDelegatesPerTurn() before dispatch. Rejected delegates logged + system events for agent visibility. Uses continuation-runtime.ts single normalization authority. Signed-off-by: Silas 🌫️

…at consumption time

Bracket-origin delegate spawns now emit [continuation:chain-hop:N] prefix matching the tool-origin path. This ensures the announce-side guard at subagent-announce.ts:1346 can enforce maxChainLength identically for both bracket-origin and tool-origin chains. Before: [continuation] Delegated task (turn N/M): ... After: [continuation:chain-hop:N] Delegated task (turn N/M): ... Addresses WORKORDER6-codex54.md Workstream B and Codex 5.4 finding #1. Test: failing test written first (confirmed [continuation] prefix lacked chain-hop:N), then fix applied. 41/41 agent-runner tests pass.

… metadata (Workstream B)

… + prince runbooks) Merges validated canary build b07e7e4 into feature/context-pressure-squashed. Swim 7 evidence: 10/10 pass, 2 deferred, 0 regressions. Conflict resolution: take Codex round 2 code for all 5 source conflicts. - continuation-runtime.ts: 3-tier clamping + optional cfg param - subagent-announce.ts: unified tolerance for chain-hop timers - agent-runner.ts: inlined generation tracking + textless-turn fix - continue-delegate-tool.ts: resolveMaxDelegatesPerTurn canonical resolver - FINDINGS.md: deleted (fork-only artifact) Known: 10 test TS errors (mock signatures, Zod schema gaps) — fix in squash phase.

Swim 7: 10 pass, 2 deferred across tolerance hot-reload, width widen/narrow, chain boundary, textless-turn delegate, silent return trust boundary, and blind enrichment accuracy. Operator Profiles: shipped defaults (conservative) vs fleet multi-agent profile. Sensor fan-out pattern documented. All config values confirmed hot-reloadable via Swim 7 live tests. Cumulative canary validation: 23 pass, 3 deferred, 0 fail across Swim 5-7 on three successive builds.

- Appendix: Integration Test Evidence section with Swim 7 scorecard - Evidence locations table (silas-likes-to-watch PR #27, gateway logs, canary tag) - Methodology note: 4-agent test topology, blind enrichment findings - Key finding: silent enrichment indistinguishable from training knowledge - Scrub verified: zero prince names/hostnames/IPs in main RFC body - Changelog section (marked for removal) contains internal refs as expected

- Gateway log excerpts for each Swim 7 test (tolerance, width, chain, enrichment) - Each excerpt shows the guard/lifecycle event firing with timestamps - Matches evidence in silas-likes-to-watch PR #27 raw logs

…e appendix) Silas: Swim 7 canary results, Operator Configuration Profiles, fan-out pattern docs Ronan: Integration Test Evidence appendix with log excerpts and evidence locations

- agent-runner.misc.runreplyagent.test.ts: add missing continuation config fields (costCapTokens, maxDelegatesPerTurn, generationGuardTolerance) to mock type - subagent-announce.continuation.test.ts: fix importOriginal call (0 args, not 1), type mock functions with optional key param, widen return type for resolveRequesterForChildSessionMock - subagent-announce.format.e2e.test.ts: add continuationTrigger to call params type

- Add Operator Observability section with log prefix reference table - Log demotion strategy: timer set/check → debug, fire/cancel → info - Hot-reload observability: config changes emit structured log lines - Add generation guard tolerance + width control lifecycle traces - All content scrubbed per upstream RFC guidelines (no names/hosts/IPs)

…nto feature/context-pressure-squashed

* feat: add QQ Bot channel extension * fix(qqbot): add setupWizard to runtime plugin for onboard re-entry * fix: fix review * fix: fix review * chore: sync lockfile and config-docs baseline for qqbot extension * refactor: 移除图床服务器相关代码 * fix * docs: 新增 QQ Bot 插件文档并修正链接路径 * refactor: remove credential backup functionality and update setup logic - Deleted the credential backup module to streamline the codebase. - Updated the setup surface to handle client secrets more robustly, allowing for configured secret inputs. - Simplified slash commands by removing unused hot upgrade compatibility checks and related functions. - Adjusted types to use SecretInput for client secrets in QQBot configuration. - Modified bundled plugin metadata to allow additional properties in the config schema. * feat: 添加本地媒体路径解析功能，修正 QQBot 媒体路径处理 * feat: 添加本地媒体路径解析功能，修正 QQBot 媒体路径处理 * feat: remove qqbot-media and qqbot-remind skills, add tests for config and setup - Deleted the qqbot-media and qqbot-remind skills documentation files. - Added unit tests for qqbot configuration and setup processes, ensuring proper handling of SecretRef-backed credentials and account configurations. - Implemented tests for local media path remapping, verifying correct resolution of media file paths. - Removed obsolete channel and remind tools, streamlining the codebase. * feat: 更新 QQBot 配置模式，添加音频格式和账户定义 * feat: 添加 QQBot 频道管理和定时提醒技能，更新媒体路径解析功能 * fix * feat: 添加 /bot-upgrade 指令以查看 QQBot 插件升级指引 * feat: update reminder and qq channel skills * feat: 更新remind工具投递目标地址格式 * feat: Refactor QQBot payload handling and improve code documentation - Simplified and clarified the structure of payload interfaces for Cron reminders and media messages. - Enhanced the parsing function to provide clearer error messages and improved validation. - Updated platform utility functions for better cross-platform compatibility and clearer documentation. - Improved text parsing utilities for better readability and consistency in emoji representation. - Optimized upload cache management with clearer comments and reduced redundancy. - Integrated QQBot plugin into the bundled channel plugins and updated metadata for installation. * OK apps/macos/Sources/OpenClaw/HostEnvSecurityPolicy.generated.swift > openclaw@2026.3.26 check:bundled-channel-config-metadata /Users/yuehuali/code/PR/openclaw > node --import tsx scripts/generate-bundled-channel-config-metadata.ts --check [bundled-channel-config-metadata] stale generated output at src/config/bundled-channel-config-metadata.generated.ts ELIFECYCLE Command failed with exit code 1. ELIFECYCLE Command failed with exit code 1. * feat: 添加 QQBot 渠道配置及相关账户设置 * fix(qqbot): resolve 14 high-priority bugs from PR openclaw#52986 review DM routing (7 fixes): - #1: DM slash-command replies use sendDmMessage(guildId) instead of sendC2CMessage(senderId) - #2: DM qualifiedTarget uses qqbot:dm:${guildId} instead of qqbot:c2c:${senderId} - #3: sendTextChunks adds DM branch - #4: sendMarkdownReply adds DM branch for text and Base64 images - #5: parseAndSendMediaTags maps DM to targetType:dm + guildId - #6: sendTextToTarget DM branch uses sendDmMessage; MessageTarget adds guildId field - #7: handleImage/Audio/Video/FilePayload add DM branches Other high-priority fixes: - #8: Fix sendC2CVoiceMessage/sendGroupVoiceMessage parameter misalignment - #9: broadcastMessage uses groupOpenid instead of member_openid for group users - #10: Unify KnownUser storage - proactive.ts delegates to known-users.ts - #11: Remove invalid recordKnownUser calls for guild/DM users - #12: sendGroupMessage uses sendAndNotify to trigger onMessageSent hook - #13: sendPhoto channel unsupported returns error field - #14: sendTextAfterMedia adds channel and dm branches Type fixes: - DeliverEventContext adds guildId field - MediaTargetContext.targetType adds dm variant - sendPlainTextReply imgMediaTarget adds DM branch * fix(qqbot): resolve 2 blockers + 7 medium-priority bugs from PR openclaw#52986 review Blocker-1: Remove unused dmPolicy config knob - dmPolicy was declared in schema/types/plugin.json but never consumed at runtime - Removed from config-schema.ts, types.ts, and openclaw.plugin.json - allowFrom remains active (already wired into framework command-auth) Blocker-2: Gate sensitive slash commands with allowFrom authorization - SlashCommand interface adds requireAuth?: boolean - SlashCommandContext adds commandAuthorized: boolean - /bot-logs set to requireAuth: true (reads local log files) - matchSlashCommand rejects unauthorized senders for requireAuth commands - trySlashCommandOrEnqueue computes commandAuthorized from allowFrom config Medium-priority fixes: - #15: Strip non-HTTP/non-local markdown image tags to prevent path leakage - #16: applyQQBotAccountConfig clears clientSecret when setting clientSecretFile and vice versa - #17: getAdminMarkerFile sanitizes accountId to prevent path traversal - #18: URGENT_COMMANDS uses exact match instead of startsWith prefix match - #19: isCronExpression validates each token starts with a cron-valid character - #20: --token format validation rejects malformed input without colon separator - #21: resolveDefaultQQBotAccountId checks QQBOT_APP_ID environment variable * test(qqbot): add focused tests for slash command authorization path - Unauthorized sender rejected for /bot-logs (requireAuth: true) - Authorized sender allowed for /bot-logs - Non-requireAuth commands (/bot-ping, /bot-help, /bot-version) work for all senders - Unknown slash commands return null (passthrough) - Non-slash messages return null - Usage query (/bot-logs ?) also gated by auth check * fix(qqbot): align global TTS fallback with framework config resolution - Extract isGlobalTTSAvailable to utils/audio-convert.ts, mirroring core resolveTtsConfig logic: check auto !== 'off', fall back to legacy enabled boolean, default to off when neither is set. - Add pre-check in reply-dispatcher before calling globalTextToSpeech to avoid unnecessary TTS calls and noisy error logs when TTS is not configured. - Remove inline as any casts; use OpenClawConfig type throughout. - Refactor handleAudioPayload into flat early-return structure with unified send path (plugin TTS → global fallback → send). * fix(qqbot): break ESM circular dependency causing multi-account startup crash The bundled gateway chunk had a circular static import on the channel chunk (gateway -> outbound-deliver -> channel, while channel dynamically imports gateway). When two accounts start concurrently via Promise.all, the first dynamic import triggers module graph evaluation; the circular reference causes api exports (including runDiagnostics) to resolve as undefined before the module finishes evaluating. Fix: extract chunkText and TEXT_CHUNK_LIMIT from channel.ts into a new text-utils.ts leaf module. outbound-deliver.ts now imports from text-utils.ts, breaking the cycle. channel.ts re-exports for backward compatibility. * fix(qqbot): serialize gateway module import to prevent multi-account startup race When multiple accounts start concurrently via Promise.all, each calls await import('./gateway.js') independently. Due to ESM circular dependencies in the bundled output, the first import can resolve transitive exports as undefined before module evaluation completes. Fix: cache the dynamic import promise in a module-level variable so all concurrent startAccount calls share the same import, ensuring the gateway module is fully evaluated before any account uses it. * refactor(qqbot): remove startup greeting logic Remove getStartupGreetingPlan and related startup greeting delivery: - Delete startup-greeting.ts (greeting plan, marker persistence) - Delete admin-resolver.ts (admin resolution, greeting dispatch) - Remove startup greeting calls from gateway READY/RESUMED handlers - Remove isFirstReadyGlobal flag and adminCtx * fix(qqbot): skip octal escape decoding for Windows local paths Windows paths like C:\Users\1\file.txt contain backslash-digit sequences that were incorrectly matched as octal escape sequences and decoded, corrupting the file path. Detect Windows local paths (drive letter or UNC prefix) and skip the octal decoding step for them. * fix bot issue * feat: 支持 TTS 自动开关并清理配置中的 clientSecretFile * docs: 添加 QQBot 配置和消息处理的设计说明 * rebase * fix(qqbot): align slash-command auth with shared command-auth model Route requireAuth:true slash commands (e.g. /bot-logs) through the framework's api.registerCommand() so resolveCommandAuthorization() applies commands.allowFrom.qqbot precedence and qqbot: prefix normalization before any handler runs. - slash-commands.ts: registerCommand() now auto-routes by requireAuth into two maps (commands / frameworkCommands); getFrameworkCommands() exports the auth-required set for framework registration; bot-help lists both maps - index.ts: registerFull() iterates getFrameworkCommands() and calls api.registerCommand() for each; handler derives msgType from ctx.from, sends file attachments via sendDocument, supports multi-account via ctx.accountId - gateway.ts (inbound): replace raw allowFrom string comparison with qqbotPlugin.config.formatAllowFrom() to strip qqbot: prefix and uppercase before matching event.senderId - gateway.ts (pre-dispatch): remove stale auth computation; commandAuthorized is true (requireAuth:true commands never reach matchSlashCommand) - command-auth.test.ts: add regression tests for qqbot: prefix normalization in the inbound commandAuthorized computation - slash-commands.test.ts: update /bot-logs tests to expect null (command routed to framework, not in local registry) * rebase and solve conflict * fix(qqbot): preserve mixed env setup credentials --------- Co-authored-by: yuehuali <yuehuali@tencent.com> Co-authored-by: walli <walli@tencent.com> Co-authored-by: WideLee <limkuan24@gmail.com> Co-authored-by: Frank Yang <frank.ekn@gmail.com>

elliott-dandelion-cult · 2026-04-19T01:14:20Z

Closing — observability work superseded by #172 (sentinel fix) + #173 (noop breadcrumbs + diagnostic cleanup). 🌻

cael-dandelion-cult and others added 25 commits March 6, 2026 15:55

docs: add CODEWALK.md — line-level analysis for P0-P3 fix domains

d7556ff

docs(RFC): document WORK-vs-DELEGATE tolerance asymmetry + P2 hot-reload

294fdce

- WORK timers use strict equality (intentional: stop and listen) - DELEGATE timers use tolerance-aware comparison (survive chatter) - maxDelegatesPerTurn hot-reload behavior documented Signed-off-by: Silas 🌫️

fix(P2): reconcile maxDelegatesPerTurn default — comment said 10, run…

9092131

…time used 5 Type comment at types.agent-defaults.ts:296 updated to match runtime hardcoded default of 5. Swim 6-7b validated at 5. Signed-off-by: Silas 🌫️

Merge ronan/p1p2-clean: P1 tolerance closure fix — generation module …

cc2135f

…extraction

Merge silas/p1p2-fixes: P2 maxDelegatesPerTurn hot-reload + RFC docs

20929ea

fix: tsc type annotations in P2 hot-reload test

89d52c3

docs: add WORKORDER6-codex54.md from figs's Codex session

d3e93d5

Merge ronan/p1p2-clean: continuation-runtime.ts — single config norma…

30d4d00

…lization authority

Merge elliott/p1p2-walkthrough: announce-side chain guard tests + FIN…

82e203a

…DINGS.md update

Merge silas/p1p2-fixes: secondary enforcement of maxDelegatesPerTurn …

0aa2c94

…at consumption time

Merge elliott/p1p2-walkthrough: canonicalize bracket-origin chain-hop…

1598c7c

… metadata (Workstream B)

docs(RFC): add key evidence lines to integration test appendix

00ed132

- Gateway log excerpts for each Swim 7 test (tolerance, width, chain, enrichment) - Each excerpt shows the guard/lifecycle event firing with timestamps - Matches evidence in silas-likes-to-watch PR #27 raw logs

Merge RFC polish: Silas (Swim 7 + Operator Profiles) + Ronan (evidenc…

3ab9f54

…e appendix) Silas: Swim 7 canary results, Operator Configuration Profiles, fan-out pattern docs Ronan: Integration Test Evidence appendix with log excerpts and evidence locations

Merge remote-tracking branch 'origin/elliott/rfc-polish-monitoring' i…

2f16272

…nto feature/context-pressure-squashed

elliott-dandelion-cult mentioned this pull request Apr 2, 2026

🧹 #27 Audit: 3 open PRs + 2 bugs in openclaw fork need triage #60

Open

elliott-dandelion-cult closed this Apr 19, 2026

elliott-dandelion-cult deleted the elliott/rfc-polish-monitoring branch April 23, 2026 19:14

cael-dandelion-cult mentioned this pull request May 1, 2026

gateway: assistant messages with trailing thinking block latch auth-profile cooldown via Anthropic 400 (P1, all 4 princes) #501

Open

This was referenced May 1, 2026

meta-CI: long-lived integration branches (canonical2) get zero push-CI; collisions ship invisibly green-by-default #503

Open

hooks/session-memory writes timestamped variant files bypassing pre-compaction canonical-only restriction #504

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: RFC observability section + lifecycle event traces 🌻#15

docs: RFC observability section + lifecycle event traces 🌻#15
karmafeast wants to merge 25 commits intoflesh-beast-figs/for_thornfield_consider20260306from
elliott/rfc-polish-monitoring

karmafeast commented Mar 7, 2026

Uh oh!

elliott-dandelion-cult commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

karmafeast commented Mar 7, 2026

Uh oh!

elliott-dandelion-cult commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants