feat: merge in recent non-megabranch changes#110
Merged
dgarson merged 30 commits intodgarson/forkfrom Feb 23, 2026
Merged
Conversation
…malization, prompts - Fix maxPerCall config bug: broker was reading maxConcurrency for both limits; add dedicated maxPerCall field to VoiceCallSubagentConfigSchema - Replace O(n) pruning on every state transition with lazy threshold-based pruning (amortized via PRUNE_THRESHOLD=16) - Add BrokerMetrics type and metrics getter for observability: enqueued, completed, failed, expired, canceled, fallbacksSpoken, repairAttempts, repairSuccesses, totalExecutionMs - Improve JSON normalization pipeline: strip markdown fences, trailing commas, single-line comments; case-insensitive key matching; expanded alias sets; export extractFirstJsonObject, parseBoolean, parseNumber - Lighten repairPayload: thinkLevel=off, 4s timeout, truncate input to 1000 chars, pass context through to avoid redundant loadCoreAgentDeps - Clean up orphaned session store entries and temp files after job completion (finally block with fs.unlinkSync) - Move foreground envelope prompt to end of system prompt with structured RESPONSE FORMAT heading; wrap conversation history in XML tags - Add specialist-specific prompts (SPECIALIST_PROMPTS map) for research, scheduler, and policy roles - Expand test coverage: 55 tests across broker and normalization suites covering pruning, metrics, deadline clamping, JSON extraction edge cases, alias resolution, type coercion, and specialist prompts https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
…onflict - Wrap speakFallbackIfActive in try/catch inside the runJob catch block to prevent unhandled promise rejections when onSummaryReady is broken - Check remaining time before applying Math.max(1000ms) clamp so jobs that have already expired hit the expiry branch instead of launching a 1s LLM call past their deadline - Replace "use tools when helpful" with delegation guidance in the foreground voice prompt to avoid contradicting the JSON-only envelope instruction — tools belong in the specialist background lane https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
Replace naive regex-based stripTrailingCommas and stripJsonComments with character-walking implementations that track whether we're inside a JSON string literal, so commas and // sequences inside string values are never corrupted. Move session store cleanup into the finally block of runJob so entries are removed on both success and failure paths. Add reapOrphanedSessions() method that sweeps stale voice-subagent:* keys older than a configurable threshold, called at webhook server startup to catch entries orphaned by crashes. https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
fix: async agent delegation — config bug, pruning, observability, nor…
…unication-for-tts
Detailed review of codex/implement-async-sub-agent-communication-for-tts. Covers two must-fix bugs (canceled metric inaccuracy, unconditional envelope prompt injection), correctness concerns (race conditions, dead follow-up code), and voice-call pattern improvements (backpressure, deduplication, fallback double-speak). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Bugs fixed: - cancelByCall now returns the actual count of jobs canceled (was void); cancelCallJobs and runJob accumulate the real count instead of always adding 1, preventing metric inflation and double-counting - envelopePrompt JSON format instruction is now gated on subagents?.enabled; disabled deployments no longer receive a changed system prompt, and rawText is returned directly without envelope parsing when subagents are off - summaryDelivered flag prevents catch block from delivering fallback speech when onSummaryReady already succeeded, eliminating double-speak - FALLBACK_SPOKEN_SUMMARY result now sets needs_followup: false (was true), which would have triggered spurious follow-up logic once wired - callForEvent guard: cancelCallJobs is now skipped when callForEvent is null (call was never known to the manager) rather than falling back to event.callId, preventing phantom cancels Pattern improvements: - enqueue() enforces per-call queue depth before admitting a new job, providing backpressure when a call already has maxPerCall active jobs - repairPayload is skipped for short/bare outputs (< 20 chars or no JSON-like characters), avoiding a wasted LLM call on "done" / "ok" - onSummaryReady speaks the follow-up question when the specialist signals needs_followup; speak() errors are swallowed inside the callback so they do not propagate into the broker's catch/fallback path - Delegations from the foreground agent are deduplicated by specialist+goal before enqueuing to prevent redundant subagent jobs - Add JSDoc comment documenting the concurrent saveSessionStore race Tests: - cancelByCall mock signatures updated to return number - "cancelCallJobs increments canceled metric" test replaced with accurate count-based assertions and a zero-count case - New backpressure test: enqueue drops when per-call depth >= maxPerCall https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The `execute` callback in the `registerTool` call had two `noImplicitAny` errors (toolCallId and params) because the function literal's parameter types weren't being inferred from context. Fix: annotate `_toolCallId: string` and `params: Record<string, unknown>`. Using `Record<string, unknown>` rather than `Static<typeof VoiceCallToolSchema>` is correct here because all property accesses in the execute body are already guarded by `typeof` runtime checks, and the discriminated union's last member lacks `action` which would cause cascade type errors with the narrowed type. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Lock file was out of sync with apps/web-next/package.json after new frontend dependencies were added (react, vite, tailwind, etc.). https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
… Slack ID normalization Reverts two regressions introduced in commit e95e175 ("Luis/UI redesign feb 21"): 1. workspace.ts: Remove `|| !file.missing` from filterBootstrapFilesForSession — this clause defeated the MINIMAL_BOOTSTRAP_ALLOWLIST by allowing any non-missing extra bootstrap file to pass through for subagent/cron sessions. Only AGENTS.md and TOOLS.md should be included, matching the original strict allowlist filter. 2. targets.ts: Fix normalizeSlackId regex — change `{8,}` minimum-length guard to `[A-Z0-9]*[0-9][A-Z0-9]*` so short Slack IDs like C999, C1, C1A2B3 are uppercased correctly (Slack IDs need at least one digit, not 8+ chars). 3. targets.ts: Fix # handler pattern — change `/^[A-Z0-9]+$/i` to `/^[A-Z0-9_-]+$/i` so #channel-names with hyphens (e.g. #eng-frontend, #does-not-exist) are parsed as name lookups rather than rejected with an ID-requirement error. Fixes 4 test failures in src/slack/targets.test.ts and src/slack/send.blocks.test.ts, and 1 failure in src/hooks/bundled/bootstrap-extra-files/handler.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The resolveSlackMedia describe block's beforeEach mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly (not via resolvePinnedHostname). This left the SSRF guard unmocked, causing real DNS resolution to fail in the test environment and all download-dependent tests to silently return null. Fix: spy on resolvePinnedHostnameWithPolicy, matching the pattern already used correctly in the resolveSlackAttachmentContent describe block added in the same commit (54eaca8). Fixes 6 test failures in src/slack/monitor/media.test.ts. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The harness mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls resolvePinnedHostnameWithPolicy directly. This left the SSRF guard unmocked, causing real DNS lookups for example.com which fail with EAI_AGAIN in the test environment — all 7 tests in cf-markdown and response-limit test files silently errored out. Fix: spy on resolvePinnedHostnameWithPolicy and inject the lookupMock while preserving the rest of the params (policy, etc.) so the existing SSRF logic still runs cleanly with a controlled DNS resolver. Same root cause as the resolveSlackMedia SSRF mock bug fixed in the previous commit: resolvePinnedHostname delegates to resolvePinnedHostnameWithPolicy, so mocking only the wrapper never intercepts the real call site. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
All tool execution failure log lines now include [sessionKey] prefix for easier per-session filtering in logs. - pi-tool-definition-adapter: thread sessionKey through splitSdkTools -> toToolDefinitions; prefix [tools] <name> failed, stack trace, and after_tool_call hook failure logs - tool-split: splitSdkTools now accepts and forwards sessionKey - attempt.ts / compact.ts: pass params.sessionKey to splitSdkTools - pi-embedded-subscribe.handlers: prefix tool_execution_start/end handler failed logs with ctx.params.sessionKey - pi-embedded-subscribe.handlers.types: add sessionKey to ToolHandlerParams pick so it is available in ToolHandlerContext - pi-embedded-subscribe.handlers.tools: prefix after_tool_call hook failed with ctx.params.sessionKey - pi-tools.before-tool-call: prefix tool loop outcome tracking and before_tool_call hook failed with args.ctx?.sessionKey - plugins/tools: prefix plugin tool failed with params.context.sessionKey
…st tests Both tests mocked ssrf.resolvePinnedHostname (the wrapper) instead of ssrf.resolvePinnedHostnameWithPolicy (the function fetch-guard.ts calls directly), leaving real DNS resolution in place and causing EAI_AGAIN failures for example.com and api.telegram.org in the test environment. src/web/media.test.ts: Use the delegate-to-real pattern so the SSRF policy checks (private-network blocking, cloud-metadata blocking) still execute correctly, with only DNS resolution mocked. src/telegram/bot.create-telegram-bot.test.ts: Add resolvePinnedHostnameWithPolicy spy scoped to the "buffers channel_post media groups" test, which mocks globalThis.fetch for photo downloads but was missing the DNS mock needed for the SSRF guard pre-flight. https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
…sync-X3Ox7 - discord test: prefer Parameters<typeof actual.loadConfig> over unknown[] for type safety - slack targets: add normalizeSlackId helper and apply to all ID parse paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-in-all-phases' into merge-recent-changes
…h dashes Slack channel IDs are purely alphanumeric (e.g. C01234ABCDE) — change the # prefix branch pattern from /^[A-Z0-9_-]+$/i to /^[A-Z0-9]+$/i so that human-readable names like 'general-1' are correctly rejected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add maxApprovalChainDepth for approval-boundary enforcement - Add no-self-approval enforcement via requireDifferentActor - Add escalation config (onDeny, onTimeout, maxEscalations) - Add shouldEscalate() to policy engine for deny/timeout scenarios - Add comprehensive integration tests for approve/deny/timeout/escalation paths This extends the Phase 1 HITL infrastructure with strict policy enforcement and escalation behavior for denied or timed-out approvals.
Resolves five issues identified in code review of feat/hitl-gateway: 1. Wire approvalManager into HitlGateway lifecycle checkAndGate now calls approvalManager.create()+register() so callers can block on approvalManager.awaitDecision(requestId). recordDecision now calls approvalManager.resolve() to unblock them. Previously the in-memory promise lifecycle was completely disconnected from the SQLite store, meaning any caller awaiting a decision would hang forever. 2. Require policy to be present for authorization in recordDecision Previously, if request.policyId was not in policiesById, the entire authorization block was silently skipped, allowing any actor to approve/deny with no role or actor checks. Now returns an explicit error when the policy is not found. 3. Fix expireRequest escalation threshold to measure from expiresAtMs afterTimeoutMs is documented as 'after the timeout' but was measured from createdAtMs, causing escalation to fire well before the request even expired. Now measures from expiresAtMs so afterTimeoutMs=30_000 means 'escalate 30s after the approval window closes'. 4. Reject decisions on requests past their expiresAtMs deadline recordDecision only checked request.status, not the clock. Without a running timeout sweep (Phase 5), status stays 'pending' past expiry, so a late approver could bypass the timeout policy. Now returns an explicit 'expired' error when the deadline has passed. 5. Fix listPendingRequests to query by status instead of post-filter The prior implementation fetched the latest 100 records and filtered in memory, silently dropping pending requests older than #100. Now queries each status (pending, escalated) explicitly with limit=500. Additional fixes: - Fix requireDifferentActor type in HitlPolicyDefinition: string|boolean -> boolean, consistent with types.approvals.ts and the Zod schema. - Fix extractShellCommandFromArgv to join all args after cmd.exe /c instead of taking only argv[idx+1], so multi-token cmd.exe commands are validated and matched correctly against rawCommand. - Add HitlEscalationSchema + maxApprovalChainDepth to zod-schema.approvals to resolve schema/type drift between runtime types and Zod validation. - Add test coverage for all fixed behaviors (31 gateway tests total). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Describe the problem and fix in 2–5 bullets:
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
List user-visible changes (including defaults/config).
If none, write
None.Security Impact (required)
Yes/No)Yes/No)Yes/No)Yes/No)Yes/No)Yes, explain risk + mitigation:Repro + Verification
Environment
Steps
Expected
Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
Compatibility / Migration
Yes/No)Yes/No)Yes/No)Failure Recovery (if this breaks)
Risks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.