Skip to content

feat: merge in recent non-megabranch changes#110

Merged
dgarson merged 30 commits intodgarson/forkfrom
merge-recent-changes
Feb 23, 2026
Merged

feat: merge in recent non-megabranch changes#110
dgarson merged 30 commits intodgarson/forkfrom
merge-recent-changes

Conversation

@dgarson
Copy link
Owner

@dgarson dgarson commented Feb 23, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
  • Files/config to restore:
  • Known bad symptoms reviewers should watch for:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

dgarson and others added 30 commits February 21, 2026 12:43
…malization, prompts

- Fix maxPerCall config bug: broker was reading maxConcurrency for both
  limits; add dedicated maxPerCall field to VoiceCallSubagentConfigSchema
- Replace O(n) pruning on every state transition with lazy threshold-based
  pruning (amortized via PRUNE_THRESHOLD=16)
- Add BrokerMetrics type and metrics getter for observability: enqueued,
  completed, failed, expired, canceled, fallbacksSpoken, repairAttempts,
  repairSuccesses, totalExecutionMs
- Improve JSON normalization pipeline: strip markdown fences, trailing
  commas, single-line comments; case-insensitive key matching; expanded
  alias sets; export extractFirstJsonObject, parseBoolean, parseNumber
- Lighten repairPayload: thinkLevel=off, 4s timeout, truncate input to
  1000 chars, pass context through to avoid redundant loadCoreAgentDeps
- Clean up orphaned session store entries and temp files after job
  completion (finally block with fs.unlinkSync)
- Move foreground envelope prompt to end of system prompt with structured
  RESPONSE FORMAT heading; wrap conversation history in XML tags
- Add specialist-specific prompts (SPECIALIST_PROMPTS map) for research,
  scheduler, and policy roles
- Expand test coverage: 55 tests across broker and normalization suites
  covering pruning, metrics, deadline clamping, JSON extraction edge
  cases, alias resolution, type coercion, and specialist prompts

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
…onflict

- Wrap speakFallbackIfActive in try/catch inside the runJob catch block
  to prevent unhandled promise rejections when onSummaryReady is broken
- Check remaining time before applying Math.max(1000ms) clamp so jobs
  that have already expired hit the expiry branch instead of launching
  a 1s LLM call past their deadline
- Replace "use tools when helpful" with delegation guidance in the
  foreground voice prompt to avoid contradicting the JSON-only envelope
  instruction — tools belong in the specialist background lane

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
Replace naive regex-based stripTrailingCommas and stripJsonComments with
character-walking implementations that track whether we're inside a JSON
string literal, so commas and // sequences inside string values are never
corrupted.

Move session store cleanup into the finally block of runJob so entries are
removed on both success and failure paths.  Add reapOrphanedSessions()
method that sweeps stale voice-subagent:* keys older than a configurable
threshold, called at webhook server startup to catch entries orphaned by
crashes.

https://claude.ai/code/session_01SYcKJe4ySZthJSCkKxzbFZ
fix: async agent delegation — config bug, pruning, observability, nor…
Detailed review of codex/implement-async-sub-agent-communication-for-tts.
Covers two must-fix bugs (canceled metric inaccuracy, unconditional envelope
prompt injection), correctness concerns (race conditions, dead follow-up code),
and voice-call pattern improvements (backpressure, deduplication, fallback
double-speak).

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Bugs fixed:
- cancelByCall now returns the actual count of jobs canceled (was void);
  cancelCallJobs and runJob accumulate the real count instead of always
  adding 1, preventing metric inflation and double-counting
- envelopePrompt JSON format instruction is now gated on
  subagents?.enabled; disabled deployments no longer receive a changed
  system prompt, and rawText is returned directly without envelope
  parsing when subagents are off
- summaryDelivered flag prevents catch block from delivering fallback
  speech when onSummaryReady already succeeded, eliminating double-speak
- FALLBACK_SPOKEN_SUMMARY result now sets needs_followup: false (was
  true), which would have triggered spurious follow-up logic once wired
- callForEvent guard: cancelCallJobs is now skipped when callForEvent
  is null (call was never known to the manager) rather than falling back
  to event.callId, preventing phantom cancels

Pattern improvements:
- enqueue() enforces per-call queue depth before admitting a new job,
  providing backpressure when a call already has maxPerCall active jobs
- repairPayload is skipped for short/bare outputs (< 20 chars or no
  JSON-like characters), avoiding a wasted LLM call on "done" / "ok"
- onSummaryReady speaks the follow-up question when the specialist
  signals needs_followup; speak() errors are swallowed inside the
  callback so they do not propagate into the broker's catch/fallback path
- Delegations from the foreground agent are deduplicated by
  specialist+goal before enqueuing to prevent redundant subagent jobs
- Add JSDoc comment documenting the concurrent saveSessionStore race

Tests:
- cancelByCall mock signatures updated to return number
- "cancelCallJobs increments canceled metric" test replaced with
  accurate count-based assertions and a zero-count case
- New backpressure test: enqueue drops when per-call depth >= maxPerCall

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The `execute` callback in the `registerTool` call had two `noImplicitAny`
errors (toolCallId and params) because the function literal's parameter
types weren't being inferred from context.

Fix: annotate `_toolCallId: string` and `params: Record<string, unknown>`.
Using `Record<string, unknown>` rather than `Static<typeof VoiceCallToolSchema>`
is correct here because all property accesses in the execute body are already
guarded by `typeof` runtime checks, and the discriminated union's last member
lacks `action` which would cause cascade type errors with the narrowed type.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
Lock file was out of sync with apps/web-next/package.json after new
frontend dependencies were added (react, vite, tailwind, etc.).

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
… Slack ID normalization

Reverts two regressions introduced in commit e95e175 ("Luis/UI redesign feb 21"):

1. workspace.ts: Remove `|| !file.missing` from filterBootstrapFilesForSession —
   this clause defeated the MINIMAL_BOOTSTRAP_ALLOWLIST by allowing any non-missing
   extra bootstrap file to pass through for subagent/cron sessions. Only AGENTS.md
   and TOOLS.md should be included, matching the original strict allowlist filter.

2. targets.ts: Fix normalizeSlackId regex — change `{8,}` minimum-length guard to
   `[A-Z0-9]*[0-9][A-Z0-9]*` so short Slack IDs like C999, C1, C1A2B3 are
   uppercased correctly (Slack IDs need at least one digit, not 8+ chars).

3. targets.ts: Fix # handler pattern — change `/^[A-Z0-9]+$/i` to `/^[A-Z0-9_-]+$/i`
   so #channel-names with hyphens (e.g. #eng-frontend, #does-not-exist) are parsed
   as name lookups rather than rejected with an ID-requirement error.

Fixes 4 test failures in src/slack/targets.test.ts and src/slack/send.blocks.test.ts,
and 1 failure in src/hooks/bundled/bootstrap-extra-files/handler.test.ts.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The resolveSlackMedia describe block's beforeEach mocked
ssrf.resolvePinnedHostname, but fetch-guard.ts calls
resolvePinnedHostnameWithPolicy directly (not via resolvePinnedHostname).
This left the SSRF guard unmocked, causing real DNS resolution to fail
in the test environment and all download-dependent tests to silently
return null.

Fix: spy on resolvePinnedHostnameWithPolicy, matching the pattern already
used correctly in the resolveSlackAttachmentContent describe block added
in the same commit (54eaca8).

Fixes 6 test failures in src/slack/monitor/media.test.ts.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
The harness mocked ssrf.resolvePinnedHostname, but fetch-guard.ts calls
resolvePinnedHostnameWithPolicy directly. This left the SSRF guard unmocked,
causing real DNS lookups for example.com which fail with EAI_AGAIN in the
test environment — all 7 tests in cf-markdown and response-limit test files
silently errored out.

Fix: spy on resolvePinnedHostnameWithPolicy and inject the lookupMock while
preserving the rest of the params (policy, etc.) so the existing SSRF logic
still runs cleanly with a controlled DNS resolver.

Same root cause as the resolveSlackMedia SSRF mock bug fixed in the previous
commit: resolvePinnedHostname delegates to resolvePinnedHostnameWithPolicy,
so mocking only the wrapper never intercepts the real call site.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
All tool execution failure log lines now include [sessionKey] prefix
for easier per-session filtering in logs.

- pi-tool-definition-adapter: thread sessionKey through splitSdkTools
  -> toToolDefinitions; prefix [tools] <name> failed, stack trace,
  and after_tool_call hook failure logs
- tool-split: splitSdkTools now accepts and forwards sessionKey
- attempt.ts / compact.ts: pass params.sessionKey to splitSdkTools
- pi-embedded-subscribe.handlers: prefix tool_execution_start/end
  handler failed logs with ctx.params.sessionKey
- pi-embedded-subscribe.handlers.types: add sessionKey to
  ToolHandlerParams pick so it is available in ToolHandlerContext
- pi-embedded-subscribe.handlers.tools: prefix after_tool_call hook
  failed with ctx.params.sessionKey
- pi-tools.before-tool-call: prefix tool loop outcome tracking and
  before_tool_call hook failed with args.ctx?.sessionKey
- plugins/tools: prefix plugin tool failed with
  params.context.sessionKey
…st tests

Both tests mocked ssrf.resolvePinnedHostname (the wrapper) instead of
ssrf.resolvePinnedHostnameWithPolicy (the function fetch-guard.ts calls
directly), leaving real DNS resolution in place and causing EAI_AGAIN
failures for example.com and api.telegram.org in the test environment.

src/web/media.test.ts: Use the delegate-to-real pattern so the SSRF policy
checks (private-network blocking, cloud-metadata blocking) still execute
correctly, with only DNS resolution mocked.

src/telegram/bot.create-telegram-bot.test.ts: Add resolvePinnedHostnameWithPolicy
spy scoped to the "buffers channel_post media groups" test, which mocks
globalThis.fetch for photo downloads but was missing the DNS mock needed
for the SSRF guard pre-flight.

https://claude.ai/code/session_01THZqxrjf9fbMGMDuTJX9Rg
…sync-X3Ox7

- discord test: prefer Parameters<typeof actual.loadConfig> over unknown[] for type safety
- slack targets: add normalizeSlackId helper and apply to all ID parse paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…h dashes

Slack channel IDs are purely alphanumeric (e.g. C01234ABCDE) — change the #
prefix branch pattern from /^[A-Z0-9_-]+$/i to /^[A-Z0-9]+$/i so that
human-readable names like 'general-1' are correctly rejected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add maxApprovalChainDepth for approval-boundary enforcement
- Add no-self-approval enforcement via requireDifferentActor
- Add escalation config (onDeny, onTimeout, maxEscalations)
- Add shouldEscalate() to policy engine for deny/timeout scenarios
- Add comprehensive integration tests for approve/deny/timeout/escalation paths

This extends the Phase 1 HITL infrastructure with strict policy enforcement
and escalation behavior for denied or timed-out approvals.
Resolves five issues identified in code review of feat/hitl-gateway:

1. Wire approvalManager into HitlGateway lifecycle
   checkAndGate now calls approvalManager.create()+register() so callers
   can block on approvalManager.awaitDecision(requestId). recordDecision
   now calls approvalManager.resolve() to unblock them. Previously the
   in-memory promise lifecycle was completely disconnected from the SQLite
   store, meaning any caller awaiting a decision would hang forever.

2. Require policy to be present for authorization in recordDecision
   Previously, if request.policyId was not in policiesById, the entire
   authorization block was silently skipped, allowing any actor to
   approve/deny with no role or actor checks. Now returns an explicit
   error when the policy is not found.

3. Fix expireRequest escalation threshold to measure from expiresAtMs
   afterTimeoutMs is documented as 'after the timeout' but was measured
   from createdAtMs, causing escalation to fire well before the request
   even expired. Now measures from expiresAtMs so afterTimeoutMs=30_000
   means 'escalate 30s after the approval window closes'.

4. Reject decisions on requests past their expiresAtMs deadline
   recordDecision only checked request.status, not the clock. Without a
   running timeout sweep (Phase 5), status stays 'pending' past expiry,
   so a late approver could bypass the timeout policy. Now returns an
   explicit 'expired' error when the deadline has passed.

5. Fix listPendingRequests to query by status instead of post-filter
   The prior implementation fetched the latest 100 records and filtered
   in memory, silently dropping pending requests older than #100. Now
   queries each status (pending, escalated) explicitly with limit=500.

Additional fixes:
- Fix requireDifferentActor type in HitlPolicyDefinition: string|boolean
  -> boolean, consistent with types.approvals.ts and the Zod schema.
- Fix extractShellCommandFromArgv to join all args after cmd.exe /c
  instead of taking only argv[idx+1], so multi-token cmd.exe commands
  are validated and matched correctly against rawCommand.
- Add HitlEscalationSchema + maxApprovalChainDepth to zod-schema.approvals
  to resolve schema/type drift between runtime types and Zod validation.
- Add test coverage for all fixed behaviors (31 gateway tests total).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dgarson dgarson merged commit f512842 into dgarson/fork Feb 23, 2026
2 of 9 checks passed
@dgarson dgarson deleted the merge-recent-changes branch February 23, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants