feat: merge in recent non-megabranch changes by dgarson · Pull Request #110 · dgarson/clawdbot

dgarson · 2026-02-23T19:07:10Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

New permissions/capabilities? (Yes/No)
Secrets/tokens handling changed? (Yes/No)
New/changed network calls? (Yes/No)
Command/tool execution surface changed? (Yes/No)
Data access scope changed? (Yes/No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
Edge cases checked:
What you did not verify:

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly:
Files/config to restore:
Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

…malization, prompts - Fix maxPerCall config bug: broker was reading maxConcurrency for both limits; add dedicated maxPerCall field to VoiceCallSubagentConfigSchema - Replace O(n) pruning on every state transition with lazy threshold-based pruning (amortized via PRUNE_THRESHOLD=16) - Add BrokerMetrics type and metrics getter for observability: enqueued, completed, failed, expired, canceled, fallbacksSpoken, repairAttempts, repairSuccesses, totalExecutionMs - Improve JSON normalization pipeline: strip markdown fences, trailing commas, single-line comments; case-insensitive key matching; expanded alias sets; export extractFirstJsonObject, parseBoolean, parseNumber - Lighten repairPayload: thinkLevel=off, 4s timeout, truncate input to 1000 chars, pass context through to avoid redundant loadCoreAgentDeps - Clean up orphaned session store entries and temp files after job completion (finally block with fs.unlinkSync) - Move foreground envelope prompt to end of system prompt with structured RESPONSE FORMAT heading; wrap conversation history in XML tags - Add specialist-specific prompts (SPECIALIST_PROMPTS map) for research, scheduler, and policy roles - Expand test coverage: 55 tests across broker and normalization suites covering pruning, metrics, deadline clamping, JSON extraction edge cases, alias resolution, type coercion, and specialist prompts https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

…onflict - Wrap speakFallbackIfActive in try/catch inside the runJob catch block to prevent unhandled promise rejections when onSummaryReady is broken - Check remaining time before applying Math.max(1000ms) clamp so jobs that have already expired hit the expiry branch instead of launching a 1s LLM call past their deadline - Replace "use tools when helpful" with delegation guidance in the foreground voice prompt to avoid contradicting the JSON-only envelope instruction — tools belong in the specialist background lane https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

Replace naive regex-based stripTrailingCommas and stripJsonComments with character-walking implementations that track whether we're inside a JSON string literal, so commas and // sequences inside string values are never corrupted. Move session store cleanup into the finally block of runJob so entries are removed on both success and failure paths. Add reapOrphanedSessions() method that sweeps stale voice-subagent:* keys older than a configurable threshold, called at webhook server startup to catch entries orphaned by crashes. https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ

fix: async agent delegation — config bug, pruning, observability, nor…

…unication-for-tts

Detailed review of codex/implement-async-sub-agent-communication-for-tts. Covers two must-fix bugs (canceled metric inaccuracy, unconditional envelope prompt injection), correctness concerns (race conditions, dead follow-up code), and voice-call pattern improvements (backpressure, deduplication, fallback double-speak). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

Bugs fixed: - cancelByCall now returns the actual count of jobs canceled (was void); cancelCallJobs and runJob accumulate the real count instead of always adding 1, preventing metric inflation and double-counting - envelopePrompt JSON format instruction is now gated on subagents?.enabled; disabled deployments no longer receive a changed system prompt, and rawText is returned directly without envelope parsing when subagents are off - summaryDelivered flag prevents catch block from delivering fallback speech when onSummaryReady already succeeded, eliminating double-speak - FALLBACK_SPOKEN_SUMMARY result now sets needs_followup: false (was true), which would have triggered spurious follow-up logic once wired - callForEvent guard: cancelCallJobs is now skipped when callForEvent is null (call was never known to the manager) rather than falling back to event.callId, preventing phantom cancels Pattern improvements: - enqueue() enforces per-call queue depth before admitting a new job, providing backpressure when a call already has maxPerCall active jobs - repairPayload is skipped for short/bare outputs (< 20 chars or no JSON-like characters), avoiding a wasted LLM call on "done" / "ok" - onSummaryReady speaks the follow-up question when the specialist signals needs_followup; speak() errors are swallowed inside the callback so they do not propagate into the broker's catch/fallback path - Delegations from the foreground agent are deduplicated by specialist+goal before enqueuing to prevent redundant subagent jobs - Add JSDoc comment documenting the concurrent saveSessionStore race Tests: - cancelByCall mock signatures updated to return number - "cancelCallJobs increments canceled metric" test replaced with accurate count-based assertions and a zero-count case - New backpressure test: enqueue drops when per-call depth >= maxPerCall https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The `execute` callback in the `registerTool` call had two `noImplicitAny` errors (toolCallId and params) because the function literal's parameter types weren't being inferred from context. Fix: annotate `_toolCallId: string` and `params: Record<string, unknown>`. Using `Record<string, unknown>` rather than `Static<typeof VoiceCallToolSchema>` is correct here because all property accesses in the execute body are already guarded by `typeof` runtime checks, and the discriminated union's last member lacks `action` which would cause cascade type errors with the narrowed type. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

Lock file was out of sync with apps/web-next/package.json after new frontend dependencies were added (react, vite, tailwind, etc.). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

… Slack ID normalization Reverts two regressions introduced in commit e95e175 ("Luis/UI redesign feb 21"): 1. workspace.ts: Remove `|| !file.missing` from filterBootstrapFilesForSession — this clause defeated the MINIMAL_BOOTSTRAP_ALLOWLIST by allowing any non-missing extra bootstrap file to pass through for subagent/cron sessions. Only AGENTS.md and TOOLS.md should be included, matching the original strict allowlist filter. 2. targets.ts: Fix normalizeSlackId regex — change `{8,}` minimum-length guard to `[A-Z0-9]*[0-9][A-Z0-9]*` so short Slack IDs like C999, C1, C1A2B3 are uppercased correctly (Slack IDs need at least one digit, not 8+ chars). 3. targets.ts: Fix # handler pattern — change `/^[A-Z0-9]+$/i` to `/^[A-Z0-9_-]+$/i` so #channel-names with hyphens (e.g. #eng-frontend, #does-not-exist) are parsed as name lookups rather than rejected with an ID-requirement error. Fixes 4 test failures in src/slack/targets.test.ts and src/slack/send.blocks.test.ts, and 1 failure in src/hooks/bundled/bootstrap-extra-files/handler.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The resolveSlackMedia describe block's beforeEach mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly (not via resolvePinnedHostname). This left the SSRF guard unmocked, causing real DNS resolution to fail in the test environment and all download-dependent tests to silently return null. Fix: spy on resolvePinnedHostnameWithPolicy, matching the pattern already used correctly in the resolveSlackAttachmentContent describe block added in the same commit (54eaca8). Fixes 6 test failures in src/slack/monitor/media.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

The harness mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly. This left the SSRF guard unmocked, causing real DNS lookups for example.com which fail with EAI_AGAIN in the test environment — all 7 tests in cf-markdown and response-limit test files silently errored out. Fix: spy on resolvePinnedHostnameWithPolicy and inject the lookupMock while preserving the rest of the params (policy, etc.) so the existing SSRF logic still runs cleanly with a controlled DNS resolver. Same root cause as the resolveSlackMedia SSRF mock bug fixed in the previous commit: resolvePinnedHostname delegates to resolvePinnedHostnameWithPolicy, so mocking only the wrapper never intercepts the real call site. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

All tool execution failure log lines now include [sessionKey] prefix for easier per-session filtering in logs. - pi-tool-definition-adapter: thread sessionKey through splitSdkTools -> toToolDefinitions; prefix [tools] <name> failed, stack trace, and after_tool_call hook failure logs - tool-split: splitSdkTools now accepts and forwards sessionKey - attempt.ts / compact.ts: pass params.sessionKey to splitSdkTools - pi-embedded-subscribe.handlers: prefix tool_execution_start/end handler failed logs with ctx.params.sessionKey - pi-embedded-subscribe.handlers.types: add sessionKey to ToolHandlerParams pick so it is available in ToolHandlerContext - pi-embedded-subscribe.handlers.tools: prefix after_tool_call hook failed with ctx.params.sessionKey - pi-tools.before-tool-call: prefix tool loop outcome tracking and before_tool_call hook failed with args.ctx?.sessionKey - plugins/tools: prefix plugin tool failed with params.context.sessionKey

…all-phases

…st tests Both tests mocked ssrf.resolvePinnedHostname (the wrapper) instead of ssrf.resolvePinnedHostnameWithPolicy (the function fetch-guard.ts calls directly), leaving real DNS resolution in place and causing EAI_AGAIN failures for example.com and api.telegram.org in the test environment. src/web/media.test.ts: Use the delegate-to-real pattern so the SSRF policy checks (private-network blocking, cloud-metadata blocking) still execute correctly, with only DNS resolution mocked. src/telegram/bot.create-telegram-bot.test.ts: Add resolvePinnedHostnameWithPolicy spy scoped to the "buffers channel_post media groups" test, which mocks globalThis.fetch for photo downloads but was missing the DNS mock needed for the SSRF guard pre-flight. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

…all-phases

…sync-X3Ox7 - discord test: prefer Parameters<typeof actual.loadConfig> over unknown[] for type safety - slack targets: add normalizeSlackId helper and apply to all ID parse paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-in-all-phases' into merge-recent-changes

…h dashes Slack channel IDs are purely alphanumeric (e.g. C01234ABCDE) — change the # prefix branch pattern from /^[A-Z0-9_-]+$/i to /^[A-Z0-9]+$/i so that human-readable names like 'general-1' are correctly rejected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add maxApprovalChainDepth for approval-boundary enforcement - Add no-self-approval enforcement via requireDifferentActor - Add escalation config (onDeny, onTimeout, maxEscalations) - Add shouldEscalate() to policy engine for deny/timeout scenarios - Add comprehensive integration tests for approve/deny/timeout/escalation paths This extends the Phase 1 HITL infrastructure with strict policy enforcement and escalation behavior for denied or timed-out approvals.

Resolves five issues identified in code review of feat/hitl-gateway: 1. Wire approvalManager into HitlGateway lifecycle checkAndGate now calls approvalManager.create()+register() so callers can block on approvalManager.awaitDecision(requestId). recordDecision now calls approvalManager.resolve() to unblock them. Previously the in-memory promise lifecycle was completely disconnected from the SQLite store, meaning any caller awaiting a decision would hang forever. 2. Require policy to be present for authorization in recordDecision Previously, if request.policyId was not in policiesById, the entire authorization block was silently skipped, allowing any actor to approve/deny with no role or actor checks. Now returns an explicit error when the policy is not found. 3. Fix expireRequest escalation threshold to measure from expiresAtMs afterTimeoutMs is documented as 'after the timeout' but was measured from createdAtMs, causing escalation to fire well before the request even expired. Now measures from expiresAtMs so afterTimeoutMs=30_000 means 'escalate 30s after the approval window closes'. 4. Reject decisions on requests past their expiresAtMs deadline recordDecision only checked request.status, not the clock. Without a running timeout sweep (Phase 5), status stays 'pending' past expiry, so a late approver could bypass the timeout policy. Now returns an explicit 'expired' error when the deadline has passed. 5. Fix listPendingRequests to query by status instead of post-filter The prior implementation fetched the latest 100 records and filtered in memory, silently dropping pending requests older than #100. Now queries each status (pending, escalated) explicitly with limit=500. Additional fixes: - Fix requireDifferentActor type in HitlPolicyDefinition: string|boolean -> boolean, consistent with types.approvals.ts and the Zod schema. - Fix extractShellCommandFromArgv to join all args after cmd.exe /c instead of taking only argv[idx+1], so multi-token cmd.exe commands are validated and matched correctly against rawCommand. - Add HitlEscalationSchema + maxApprovalChainDepth to zod-schema.approvals to resolve schema/type drift between runtime types and Zod validation. - Add test coverage for all fixed behaviors (31 gateway tests total). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dgarson and others added 30 commits February 21, 2026 12:43

test(discord): align loadConfig mock signature with source

6631c87

Merge pull request #34 from dgarson/claude/async-agent-delegation-5SWIg

b6a4660

fix: async agent delegation — config bug, pruning, observability, nor…

Merge branch 'dgarson/fork' into codex/implement-async-sub-agent-comm…

99c2709

…unication-for-tts

chore: update pnpm-lock.yaml for apps/web-next dependency additions

be66e08

Lock file was out of sync with apps/web-next/package.json after new frontend dependencies were added (react, vite, tailwind, etc.). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg

Agents: fix subscribe context typing for tool diagnostics

7305a3e

Merge branch 'dgarson/fork' into codex/implement-proposed-changes-in-…

588792b

…all-phases

Merge branch 'dgarson/fork' into codex/implement-proposed-changes-in-…

ead5eaf

…all-phases

Merge remote-tracking branch 'origin/codex/implement-proposed-changes…

579718b

…-in-all-phases' into merge-recent-changes

feat(gateway): add sqlite-backed HITL request store

3a83bc6

feat(gateway): add hitl policy resolution and authorization engine

6fe4bdd

gateway: generalize HITL request payload in approval manager

d34cc37

feat(gateway): complete HITL phase1 policy/config/test coverage

54423e7

merge: resolve agent/session conflict blocks

c9b7858

fix: clear tsgo errors after merge

b4d7720

fix: tsgo

6a3eff3

dgarson merged commit f512842 into dgarson/fork Feb 23, 2026
2 of 9 checks passed

dgarson deleted the merge-recent-changes branch February 23, 2026 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: merge in recent non-megabranch changes#110

feat: merge in recent non-megabranch changes#110
dgarson merged 30 commits intodgarson/forkfrom
merge-recent-changes

dgarson commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgarson commented Feb 23, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants