Skip to content

✨ feat(task): wire QStash-driven heartbeat self-rescheduling#14199

Merged
arvinxx merged 6 commits into
canaryfrom
feat/heartbeat-active-call
Apr 26, 2026
Merged

✨ feat(task): wire QStash-driven heartbeat self-rescheduling#14199
arvinxx merged 6 commits into
canaryfrom
feat/heartbeat-active-call

Conversation

@arvinxx

@arvinxx arvinxx commented Apr 26, 2026

Copy link
Copy Markdown
Member

Summary

Implements LOBE-8233 — heartbeat tasks now actually heartbeat. Previously LocalTaskScheduler was wired to no callers and QStashTaskScheduler was a TODO; onTopicComplete ran once and stopped.

After this PR, every task with automationMode='heartbeat' self-arms its next run via QStash delayed publish (or LocalScheduler setTimeout in dev). DB is the state authority — every tick re-reads task state and may decide to skip rather than run.

Mechanism (per design doc)

  • Re-arm in TaskLifecycleService.onTopicComplete: after the existing terminal/pause logic, schedule the next tick with delay = task.heartbeatInterval. Persist tickMessageId / scheduledAt / consecutiveFailures under tasks.context.scheduler.* (JSONB pocket — no schema migration).
  • Failure fuse: 3 consecutive error reasons → stop re-arming and let the urgent error brief surface for human action.
  • Skip-when-human-waiting: any unresolved priority='urgent' brief blocks re-arm (covers review max-iter and fuse cases without a new schema column).
  • QStashTaskScheduler: thin wrapper over qstashClient.publishJSON({ delay }) + messages.delete(messageId). LocalTaskScheduler already existed, now actually invoked.
  • Watchdog cron: /api/workflows/task/watchdog handler reuses TaskModel.findStuckTasks. Cloud schedule registration is left to a one-time runbook (intentionally not auto-registered to avoid duplicate schedules.create).
  • Signature verification added to /on-topic-complete, /heartbeat-tick, /watchdog (skipped when QSTASH_CURRENT_SIGNING_KEY is unset, matching existing verifyQStashSignature behavior).

Refactors

  • Extracted TaskRunnerService.runTask from the task.run mutation (~180 lines → 14-line wrapper). Both router and tick handler share one runner.
  • Moved buildTaskPrompt into @lobechat/prompts with structurally-typed model deps so the prompts package stays free of @lobechat/database imports.

Out of scope (deferred per design doc)

  • Tunable per-task fuse threshold (hardcoded 3).
  • start/pause/resume/cancel extraction to TaskRunnerService.
  • Local-mode auto watchdog setInterval.
  • Cloud watchdog cron auto-registration.

Test plan

  • bunx vitest run src/server/services/taskScheduler (qstash + local, 19 tests)
  • bunx vitest run src/server/services/taskLifecycle (10 re-arm tests covering done / error / fuse / urgent-skip / terminal-skip / non-heartbeat)
  • bunx vitest run src/server/routers/lambda/__tests__/integration/task.integration.test.ts (19 tests still pass after task.run thin-wrapper refactor)
  • bunx vitest run packages/database src/models/__tests__/{brief,task}.test.ts (77 tests, no regressions)
  • bun run type-check clean (only pre-existing .next/dev/types artifact error)
  • Cloud (queue-mode) manual verification — set AGENT_RUNTIME_MODE=queue, run a heartbeat task, confirm messageId appears in QStash dashboard and tick fires after heartbeatInterval.
  • Cloud watchdog cron registered manually:
    qstashClient.schedules.create({
      destination: '<APP_URL>/api/workflows/task/watchdog',
      cron: '*/5 * * * *',
    });

🤖 Generated with Claude Code

arvinxx and others added 3 commits April 26, 2026 12:19
…or heterogeneous agents

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements LOBE-8233: heartbeat tasks now self-arm via QStash delayed
publish (or LocalScheduler setTimeout in dev). After each topic completes,
TaskLifecycleService re-arms the next tick based on current DB state, with
a 3-strike fuse on consecutive errors and a skip-when-urgent-brief guard.
Adds /heartbeat-tick + /watchdog workflow handlers (signed) and extracts
TaskRunnerService from the task.run mutation so both router and tick
handler share one runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Apr 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lobehub Ready Ready Preview, Comment Apr 26, 2026 0:40am

Request Review

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. feature:schedule-task Schedule task labels Apr 26, 2026
@codecov

codecov Bot commented Apr 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 68.00000% with 152 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.86%. Comparing base (196c0a7) to head (c237c34).
⚠️ Report is 8 commits behind head on canary.

Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #14199      +/-   ##
==========================================
+ Coverage   67.85%   67.86%   +0.01%     
==========================================
  Files        2227     2232       +5     
  Lines      191364   191574     +210     
  Branches    23747    23777      +30     
==========================================
+ Hits       129847   130019     +172     
- Misses      61388    61426      +38     
  Partials      129      129              
Flag Coverage Δ
app 61.14% <71.74%> (+0.04%) ⬆️
database 92.04% <10.34%> (-0.18%) ⬇️
packages/agent-runtime 79.82% <ø> (ø)
packages/context-engine 83.25% <ø> (ø)
packages/conversation-flow 92.40% <ø> (ø)
packages/file-loaders 87.02% <ø> (ø)
packages/memory-user-memory 74.74% <ø> (ø)
packages/model-bank 99.89% <ø> (ø)
packages/model-runtime 84.28% <ø> (ø)
packages/prompts 70.14% <ø> (ø)
packages/python-interpreter 92.90% <ø> (ø)
packages/ssrf-safe-fetch 0.00% <ø> (ø)
packages/utils 88.41% <ø> (ø)
packages/web-crawler 88.66% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Store 67.21% <ø> (ø)
Services 53.36% <ø> (ø)
Server 67.70% <71.68%> (+0.08%) ⬆️
Libs 53.30% <ø> (ø)
Utils 80.04% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6132bc6c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/services/taskLifecycle/index.ts Outdated
Comment thread src/server/services/taskRunner/heartbeatTick.ts Outdated
…m typing

- TaskLifecycle re-arm now excludes type='error' urgent briefs from the
  human-waiting check; the fresh error brief from onTopicComplete was
  always present and stalled retries after the very first failure,
  making the 3-strike fuse unreachable.
- TaskRunner only rolls back running→paused when *this* invocation
  set the running state; heartbeatTick treats CONFLICT as a graceful
  'in-flight' skip so overlapping ticks don't 500 or clobber the
  in-flight run's status.
- buildTaskPrompt now types its task arg + getReviewConfig as TaskItem
  (the prompts package already depends on @lobechat/types) so server
  TaskModel methods are assignable without parameter contravariance
  errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nature verification

Three handlers (on-topic-complete, heartbeat-tick, watchdog) duplicated the
same `c.req.text() → verifyQStashSignature → 401` boilerplate. Extracted to
src/server/workflows-hono/middlewares/qstashAuth.ts and mounted on the
routes; handlers now just `c.req.json()` (Hono cross-converts the cached
body so the middleware reading text() doesn't break json() in the handler).

Note: this is for one-shot QStash webhook receivers. Upstash *Workflow*
endpoints (memory-user-memory) keep using `serve()` from
`@upstash/workflow/hono`, which has its own built-in verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hestrator, not a renderer)

Putting buildTaskPrompt under @lobechat/prompts was a layering mistake:
the function does ~10 DB calls (briefs / topics / subtasks / dep
identifier resolution / parent task assembly) and just maps the rows
through to buildTaskRunPrompt at the end.

The prompts package should stay pure rendering — buildTaskRunPrompt
already lives there as the actual renderer. Moving the orchestrator
back to src/server/services/taskRunner/ also lets it import model
classes directly instead of structurally-typed deps, dropping the
TaskPromptDeps abstraction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arvinxx arvinxx merged commit 35c3d5e into canary Apr 26, 2026
34 of 35 checks passed
@arvinxx arvinxx deleted the feat/heartbeat-active-call branch April 26, 2026 12:53
@arvinxx arvinxx mentioned this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature:schedule-task Schedule task size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant