🐛 fix(server): restore sub-agent forking in QStash step worker by arvinxx · Pull Request #15609 · lobehub/lobehub

arvinxx · 2026-06-09T15:58:18Z

💻 Change Type

🐛 fix
♻️ refactor

🔀 Description of Change

The bug. In QStash mode every agent step runs in a fresh HTTP request via the hono runStep handler, which built a bare AgentRuntimeService without the execSubAgent fork callback. The callback is an in-process closure owned by AiAgentService and never survives the queue boundary, so buildServerSubAgentRunner returned undefined → ctx.subAgent was undefined → lobe-agent.callSubAgent failed in cloud with:

SUB_AGENT_UNAVAILABLE — "Sub-agent execution is not available in this runtime."

The fix. Step through AiAgentService.executeStep instead of constructing a second bare runtime. AiAgentService already builds an internal AgentRuntimeService wired with the fork callback, so the step now runs on a runtime that carries execSubAgent. No duplicate runtime, no manual rebinding — this also respects the existing AiAgentService → AgentRuntimeService dependency direction (injecting the service the other way would be circular).

Refactor (folded in). To separate the "task" concept from "sub-agent":

Renamed the internal execSubAgentTask → execSubAgent (method, runtime/tool-execution context fields, options, private callback, and the ExecSubAgent{Params,Result} types).
Made the method an auto-bound arrow field so it no longer needs .bind(this) when passed as a callback.
The external lambda procedure name (execSubAgentTask) and the client service are intentionally left unchanged.

🧪 How to Test

Added/updated tests

runStep.test.ts now asserts stepping goes through AiAgentService (which preserves the fork callback) and stays workspace-scoped. Verified:

bun run type-check — clean across the repo
Affected server suites — 156 passed (runStep, RuntimeExecutors, execGroupSubAgentTask, lambda aiAgent.execGroupSubAgentTask, task integration)
Client store suites exercising the kept execSubAgentTask client method — 33 passed

📝 Additional Information

No API/contract change: the tRPC procedure name and client-facing types are untouched, so this is server-internal only. No migration needed.

In QStash mode every agent step runs in a fresh HTTP request via the hono `runStep` handler, which built a bare AgentRuntimeService without the `execSubAgent` fork callback. As a result `lobe-agent.callSubAgent` failed with SUB_AGENT_UNAVAILABLE in cloud (the in-process callback never survives the queue boundary). Step through AiAgentService.executeStep instead, reusing its internal runtime that is already wired with the fork callback — no second runtime, no manual rebinding. Also rename the internal `execSubAgentTask` → `execSubAgent` (method, runtime/tool context fields, options, ExecSubAgent{Params,Result} types) to separate the "task" concept from "sub-agent", and make the method an auto-bound arrow field so it no longer needs `.bind(this)`. The external lambda procedure name (`execSubAgentTask`) and the client service are left unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

vercel · 2026-06-09T15:58:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lobehub	Ready	Preview, Comment	Jun 9, 2026 5:02pm

sourcery-ai

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98cb1346cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

codecov · 2026-06-09T16:03:47Z

Codecov Report

❌ Patch coverage is 68.57143% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.14%. Comparing base (af3f0ea) to head (2dce166).
⚠️ Report is 3 commits behind head on canary.

Additional details and impacted files

@@            Coverage Diff            @@
##           canary   #15609     +/-   ##
=========================================
  Coverage   67.14%   67.14%             
=========================================
  Files        3353     3353             
  Lines      338505   338506      +1     
  Branches    35060    30383   -4677     
=========================================
+ Hits       227278   227281      +3     
+ Misses     111036   111034      -2     
  Partials      191      191

Flag	Coverage Δ
app	`60.14% <68.57%> (+<0.01%)`	⬆️
database	`89.90% <ø> (ø)`
packages/agent-manager-runtime	`49.69% <ø> (ø)`
packages/agent-runtime	`81.06% <ø> (ø)`
packages/app-config	`44.58% <ø> (ø)`
packages/builtin-tool-lobe-agent	`18.52% <ø> (ø)`
packages/context-engine	`84.12% <ø> (ø)`
packages/conversation-flow	`91.29% <ø> (ø)`
packages/device-gateway-client	`90.18% <ø> (ø)`
packages/env	`11.42% <ø> (ø)`
packages/eval-dataset-parser	`95.15% <ø> (ø)`
packages/eval-rubric	`76.11% <ø> (ø)`
packages/fetch-sse	`87.28% <ø> (ø)`
packages/file-loaders	`87.89% <ø> (ø)`
packages/locales	`0.87% <ø> (ø)`
packages/memory-user-memory	`74.99% <ø> (ø)`
packages/model-bank	`99.99% <ø> (ø)`
packages/model-runtime	`84.23% <ø> (ø)`
packages/prompts	`72.51% <ø> (ø)`
packages/python-interpreter	`92.90% <ø> (ø)`
packages/ssrf-safe-fetch	`0.00% <ø> (ø)`
packages/trpc	`40.43% <ø> (ø)`
packages/types	`35.15% <ø> (ø)`
packages/utils	`85.03% <ø> (ø)`
packages/web-crawler	`88.08% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Store	`68.24% <ø> (ø)`
Services	`54.25% <ø> (ø)`
Server	`97.03% <100.00%> (ø)`
Libs	`54.19% <ø> (ø)`
Utils	`82.08% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…elegate `execSubAgent` was a loose top-level option on AgentRuntimeService, which hid that it is not ordinary config but an upward call: the low-level runtime, mid-step, triggering a high-level pipeline that lives in AiAgentService (the layer above it). Introduce `AgentRuntimeDelegate` as the single named home for these upward-call capabilities, and inject it as `delegate: { execSubAgent }`. The interface doc states the convention so future "runtime must trigger a higher-layer pipeline" capabilities land in the same place instead of sprawling as ad-hoc options. Scope is deliberately the injection surface (options + service field + AiAgentService wiring). The downstream executor/tool context keeps its flat `execSubAgent` field — the tool runner wants the unpacked capability, not the whole delegate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…p worker Post-rebase adaptation to canary's runtime restructure (#15609): - Route the webhook bridge through AiAgentService (like the /run step worker) so the runtime's models stay workspace-scoped — a bare AgentRuntimeService would be personal-scoped and the tool-message backfill / resume barrier could miss workspace-scoped rows. - Extract SubAgentBridgeParams into agentRuntime/types and add the completeSubAgentBridge passthrough next to executeStep. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…ueue mode (#15620) * 🐛 fix(agent): deliver sub-agent resume bridge via QStash webhook in queue mode The callSubAgent completion bridge was a handler-only hook, which lives in process memory: in queue mode (AGENT_RUNTIME_MODE=queue) HookDispatcher only delivers webhook-configured hooks, so the bridge never fired — the parent op stayed parked in waiting_for_async_tool forever after all sub-agents finished. - Give the bridge hook a webhook config (delivery: qstash) targeting the new /api/agent/webhooks/subagent-callback endpoint; local mode keeps the in-process handler. Both paths converge on AgentRuntimeService.completeSubAgentBridge (backfill + barrier/CAS resume). - Park-time self-check: after the parked state and operation row are persisted, re-run the resume barrier once to recover children that completed before the parent finished parking. - One-shot verify watchdog: when a completion finds the parent not yet resumable, schedule a delayed verifyAsyncToolBarrier re-check (no step lock, CAS-idempotent, never re-arms). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * 📝 docs(agent): correct verify-watchdog rationale comment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * 📝 docs(agent): clarify eventFields trimming rationale Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * ♻️ refactor(agent): align subagent-callback with workspace-scoped step worker Post-rebase adaptation to canary's runtime restructure (#15609): - Route the webhook bridge through AiAgentService (like the /run step worker) so the runtime's models stay workspace-scoped — a bare AgentRuntimeService would be personal-scoped and the tool-message backfill / resume barrier could miss workspace-scoped rows. - Extract SubAgentBridgeParams into agentRuntime/types and add the completeSubAgentBridge passthrough next to executeStep. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * 🐛 fix(agent): fail sub-agent callback loudly on backfill or delivery failure Address two review findings on the resume bridge: - completeSubAgentBridge now checks updateToolMessage's { success } result (it swallows transaction errors instead of throwing) and propagates all infrastructure failures. The webhook endpoint then returns non-2xx so QStash redelivers the whole bridge — previously a failed backfill was acked with 200 and the parent stayed parked forever, since the verify recheck only re-reads the barrier and cannot retry the backfill. - New AgentHookWebhook.fallback: 'none' opts a qstash-delivered hook out of the unsigned plain-fetch fallback, which can never authenticate against a QStash-signed endpoint and only masked publish failures as silently dropped 401s. The bridge hook uses it; dispatch escalates such delivery failures to console.error instead of the debug namespace. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

sourcery-ai Bot reviewed Jun 9, 2026

View reviewed changes

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. feature:agent Assistant/Agent configuration and behavior labels Jun 9, 2026

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread packages/types/src/agentExecution/index.ts

vercel Bot deployed to Preview June 9, 2026 16:20 View deployment

arvinxx merged commit 4b5e001 into canary Jun 9, 2026
34 of 35 checks passed

arvinxx deleted the fix/sub-agent-forking-in-step-worker branch June 9, 2026 16:41

vercel Bot deployed to Preview June 9, 2026 17:02 View deployment

arvinxx mentioned this pull request Jun 9, 2026

🚀 release: 20260610 #15619

Closed

arvinxx mentioned this pull request Jun 10, 2026

🚀 release: 20260610 #15641

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 fix(server): restore sub-agent forking in QStash step worker#15609

🐛 fix(server): restore sub-agent forking in QStash step worker#15609
arvinxx merged 2 commits into
canaryfrom
fix/sub-agent-forking-in-step-worker

arvinxx commented Jun 9, 2026

Uh oh!

vercel Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arvinxx commented Jun 9, 2026

💻 Change Type

🔀 Description of Change

🧪 How to Test

📝 Additional Information

Uh oh!

vercel Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 9, 2026 •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading