Skip to content

✨ feat(agent): inactivity watchdog finalize endpoint + agent-hono migration#14476

Merged
arvinxx merged 4 commits into
canaryfrom
arvinxx/feat/finalize-abandoned-op
May 7, 2026
Merged

✨ feat(agent): inactivity watchdog finalize endpoint + agent-hono migration#14476
arvinxx merged 4 commits into
canaryfrom
arvinxx/feat/finalize-abandoned-op

Conversation

@arvinxx

@arvinxx arvinxx commented May 6, 2026

Copy link
Copy Markdown
Member

💻 Change Type

  • ✨ feat
  • ♻️ refactor

🔗 Related Issue

Pairs with the agent-gateway DO inactivity watchdog PR (lobehub-biz/agent-gateway). Continues the LOBE-8533 thread — when the agent-runtime Vercel function is killed mid-flight, no error reaches the gateway dashboard, the _partial/ snapshot is orphaned in S3, and the assistant message stays dangling in DB.

🔀 Description of Change

Two threads in this branch.

Thread 1 — finalize-abandoned endpoint (the load-bearing change):

Thread 2 — agent-hono framework + first migrations:

  • New src/server/agent-hono/ mirrors the existing src/server/workflows-hono/ pattern.
  • Mounted via the Next.js optional catch-all src/app/(backend)/api/agent/[[...route]]/route.ts. Existing static route.ts files (run / stream / gateway / webhooks) keep winning by Next's static-segment precedence — they migrate one at a time when convenient.
  • This PR migrates three endpoints to Hono; URLs unchanged:
    • POST /api/agent (execAgent)
    • POST /api/agent/tool-result
    • POST /api/agent/finalize-abandoned (new)
  • Auth factored into reusable middlewares: serviceTokenAuth (Bearer SERVICE_TOKEN) and qstashOrApiKeyAuth (QStash sig OR AGENT_EXEC_API_KEY Bearer).

🧪 How to Test

  • Tested locally
  • Added/updated tests
  • No tests needed

24 tests pass (3 files):

bunx vitest run src/server/agent-hono src/server/services/agentRuntime/__tests__/AbandonOperationService.test.ts src/server/services/agentRuntime/__tests__/OperationTraceRecorder.test.ts

📝 Additional Information

Routing precedence sanity check. Next.js prefers static segments over dynamic ones, so:

  • /api/agent/run → existing run/route.ts
  • /api/agent/stream → existing stream/route.ts
  • /api/agent/gateway/* → existing gateway/*/route.ts
  • /api/agent/webhooks/* → existing webhooks/*/route.ts
  • /api/agent → catch-all → Hono execAgent
  • /api/agent/tool-result → catch-all → Hono toolResult
  • /api/agent/finalize-abandoned → catch-all → Hono finalizeAbandoned

Pre-existing TS errors for Cannot find module 'hono' are the same hono-resolution issue that src/server/workflows-hono/* already produces. Next.js bundles hono correctly at build time so production works; vitest can't resolve it because hono lives at node_modules/.pnpm/hono@4.12.10/ (no top-level hoist). Fixing repo-level resolution is out of scope for this PR.

Why factor OperationTraceRecorder first. Its finalize() already has a failedStep synthesis branch built specifically for LOBE-8533 — exactly what AbandonOperationService needs. Cherry-picking just commit 1 of #14441 keeps this PR's diff focused.

🤖 Generated with Claude Code

arvinxx and others added 4 commits May 7, 2026 01:01
The snapshot accumulation + finalize logic previously lived inline in
`AgentRuntimeService.executeStep` (per-step header init, message diff,
event stripping, tool delta, partial save) plus a separate helper for
finalize. Two distinct call sites for finalize, three places touching
`snapshotStore` directly, and ~120 lines of branching inside an already
overgrown method.

Pull all of it into `OperationTraceRecorder` with two methods:

- `appendStep(operationId, params)` — owns partial header init,
  incremental message diff, llm_stream / done-event finalState pruning,
  toolResults-from-payload stripping, and activatedStepTools delta.
- `finalize(operationId, params)` — owns success+error finalize,
  optional `failedStep` synthesis (LOBE-8533), and append-to-last-step
  enrichment for completion signal events.

The recorder always exists on the service; when the underlying store is
null, methods are no-ops, so the call sites no longer gate on
`if (this.snapshotStore)`. Public `AgentRuntimeServiceOptions.snapshotStore`
stays unchanged so existing tests keep injecting through the same surface.

`AgentRuntimeService.ts` shrinks from 2274 → 2084 lines (-190 net).

Tests: existing 70 agentRuntime tests + 11 new recorder unit tests
(partial header init, llm_stream stripping, done-event pruning,
messagesDelta vs baseline, compression reset, activatedStepTools delta,
success finalize, failed-step synthesis, no-partial skip,
appendEventsToLastStep, store=null no-op) — all 81 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…everse-trigger

When the agent-runtime Vercel function is killed mid-flight (LOBE-8533),
nothing reports the failure: no error reaches the gateway dashboard, the
`_partial/` snapshot is orphaned in S3, and the assistant message stays
dangling in DB. The agent-gateway DO is the only external observer that
can detect "operation went silent" — but it needs an endpoint to call so
finalization runs in a fresh function invocation.

Adds `AbandonOperationService.finalizeAbandoned(operationId, reason)` that
loads agent state from the Redis coordinator, mutates it to errored,
runs `OperationTraceRecorder.finalize()` with a synthetic failedStep
record (matching the existing LOBE-8533 error path), updates the
dangling assistant message, and cleans Redis state. Idempotent.

Exposed via `POST /api/agent/finalize-abandoned` with QStash signature
auth, body `{ operationId, reason }`. Mirrors the auth + DI pattern of
the existing `/api/agent/run` route. 6 unit tests cover the missing-state,
no-partial, no-assistantMessageId, and best-effort-cleanup paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors src/server/workflows-hono/ for QStash workflows. The finalize-abandoned
endpoint moves out of a dedicated Next.js route.ts into a Hono handler under
src/server/agent-hono/handlers/. URL is unchanged (POST /api/agent/finalize-abandoned).

The Hono app is mounted via a catch-all at src/app/(backend)/api/agent/[...route]/route.ts.
Next.js App Router prefers static segments over dynamic ones, so existing routes
(run/tool-result/stream/gateway/webhooks) continue to win — they can migrate to
Hono one at a time by deleting the static route.ts and adding a handler here.

Auth is now factored into a reusable serviceTokenAuth() middleware that mirrors
the per-route Bearer check in /api/agent/tool-result.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the two simplest /api/agent/* endpoints onto the Hono app added in
the previous commit:

- POST /api/agent (execAgent) — new handler at agent-hono/handlers/execAgent.ts;
  dual auth (QStash sig OR AGENT_EXEC_API_KEY) factored into a reusable
  qstashOrApiKeyAuth() middleware. URL unchanged.
- POST /api/agent/tool-result — new handler at agent-hono/handlers/toolResult.ts
  reusing serviceTokenAuth(). URL unchanged. Existing route test ported to a
  handler-direct unit test (5 tests, mirrors original 6 minus the auth
  middleware ones now covered by serviceTokenAuth's own contract).

Catch-all switched from required `[...route]` to optional `[[...route]]` so
the bare /api/agent path also falls through to Hono. Deleted the static
route.ts files for both endpoints. Routing precedence still puts surviving
static routes (run/stream/gateway/webhooks) in front of the Hono catch-all,
so they keep working unchanged until they're individually migrated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented May 6, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lobehub Ready Ready Preview, Comment May 6, 2026 6:02pm

Request Review

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 6, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@dosubot dosubot Bot added feature:agent Assistant/Agent configuration and behavior feature:api API endpoint and backend issues labels May 6, 2026
@chatgpt-codex-connector

Copy link
Copy Markdown

💡 Codex Review

await this.traceRecorder.finalize(operationId, {
completionReason: 'error',
error: { message, type: String(error.type) },
failedStep,
state: finalState,

P2 Badge Propagate abandoned trace finalization failures

When the watchdog hits a transient snapshot-store failure (for example S3 save or removePartial fails), OperationTraceRecorder.finalize() catches and only logs the error, so this call still returns normally; the service then reports finalized: true and proceeds to delete the Redis operation state, making a retry unable to recover the orphaned _partial/ trace. For this endpoint, finalize needs to signal success/failure (or avoid cleanup/reporting success when persistence failed) so the LOBE-8533 recovery path does not silently lose its only retry context.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.98813% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.70%. Comparing base (0c375e4) to head (97fb098).
⚠️ Report is 3 commits behind head on canary.

Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #14476      +/-   ##
==========================================
+ Coverage   68.67%   68.70%   +0.02%     
==========================================
  Files        2543     2545       +2     
  Lines      220888   221099     +211     
  Branches    22483    27989    +5506     
==========================================
+ Hits       151703   151900     +197     
- Misses      69042    69055      +13     
- Partials      143      144       +1     
Flag Coverage Δ
app 63.10% <91.98%> (+0.02%) ⬆️
database 92.41% <ø> (ø)
packages/agent-runtime 80.50% <ø> (ø)
packages/builtin-tool-lobe-agent 83.41% <ø> (ø)
packages/context-engine 83.88% <ø> (ø)
packages/conversation-flow 92.43% <ø> (ø)
packages/file-loaders 87.60% <ø> (ø)
packages/memory-user-memory 74.74% <ø> (ø)
packages/model-bank 99.94% <ø> (+<0.01%) ⬆️
packages/model-runtime 83.58% <ø> (+<0.01%) ⬆️
packages/prompts 69.59% <ø> (ø)
packages/python-interpreter 92.90% <ø> (ø)
packages/ssrf-safe-fetch 0.00% <ø> (ø)
packages/types 5.02% <ø> (ø)
packages/utils 88.02% <ø> (ø)
packages/web-crawler 88.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Store 66.77% <ø> (ø)
Services 53.78% <ø> (ø)
Server 70.67% <91.98%> (+0.06%) ⬆️
Libs 53.81% <ø> (ø)
Utils 79.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@arvinxx arvinxx merged commit 608498a into canary May 7, 2026
54 of 56 checks passed
@arvinxx arvinxx deleted the arvinxx/feat/finalize-abandoned-op branch May 7, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature:agent Assistant/Agent configuration and behavior feature:api API endpoint and backend issues size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant