🗃️ feat(database): add verify system tables for agent run delivery checker by arvinxx · Pull Request #15480 · lobehub/lobehub

arvinxx · 2026-06-04T16:39:45Z

💻 Change Type

✨ feat
👷 build

🔗 Related Issue

Part of LOBE-10019 — 实现 Agent Run 交付检查器 (Verify System) 后端业务逻辑

This PR delivers subtask 1: 数据库迁移 + Schema 定义 — the foundational database layer the remaining service/integration subtasks build on. Scoped to schema + migration so it merges first (lower layer before its callers).

🔀 Description of Change

Adds the database layer for the Agent Run delivery checker ("Verify System").

Reuse / definition layer (the part worth persisting & reusing):

verify_criteria — a single reusable pass/fail standard (the atomic unit). Carries its verifier_type / verifier_config / on_fail defaults and binds to a document for judging guidance (iteration history reuses document_history; no version columns).
verify_rubrics — a named group that aggregates criteria — the reusable unit (a rubric is a set of criteria).
verify_rubric_criteria — junction: which criteria a rubric aggregates (criteria are reusable across rubrics, with sort_order).

Mounted onto an agent via the existing agency-config jsonb (no dedicated bindings table):

agencyConfig.verifyRubricId — a reusable rubric (criteria template).
agencyConfig.verifyCriteriaIds — ad-hoc one-off criteria that don't warrant a rubric.

A run's plan instantiates the union of the rubric's criteria + the ad-hoc criteria. Topic-level mounting is intentionally out of scope for now.

Snapshot + result layer:

agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at — the per-run immutable check-item snapshot lives on the operation (1:1 — auto_repair spawns a new operation, so there's never >1 plan per op). No separate plans table. Each item carries a stable uuid + sourceCriterionId / sourceRubricId provenance, with resolved title/config frozen in (editing a criterion never drifts a historical plan).
agent_operations.verify_status — denormalized rollup (unverified → planned → verifying → passed/failed → repairing → delivered) so the operation list page renders badges / filters without joining the verify tables.
verify_check_results — per-criterion result, Toulmin-modeled: verdict (Claim) + confidence (Qualifier) are columns; narrative lives in a typed toulmin jsonb; verifier_tracing_id is N:1 (a batch generateObject shares one tracing row); verifier_operation_id links an agent verifier's sub-operation; repair_operation_id links an auto-repair operation; is_false_positive/is_false_negative precomputed for the data flywheel. Relates to the plan via operation_id + stable check_item_id (never the array index).

user_id is kept on every table for ownership / access-control WHERE user_id = ?. Redundant agent_id/topic_id were intentionally not added to results (reachable via operation_id).

Migration: packages/database/migrations/0110_add_verify_tables.sql (idempotent — IF NOT EXISTS / DROP CONSTRAINT IF EXISTS + ADD). The agent mount fields need no agents-table migration since agency_config is already jsonb.

Design note (vs the issue draft): the draft modeled a standalone verify_check_plans table and a single verify_rubrics "one rule" concept. After review we (a) collapsed the per-run plan onto agent_operations since it's strictly 1:1; (b) split the definition layer into criterion (single item) + rubric (group); and (c) mount onto an agent through agencyConfig (verifyRubricId + verifyCriteriaIds) rather than a bindings table. Migration number is 0110 (draft's 0105 predates current trunk @ 0109).

🧪 How to Test

Tested locally
Added/updated tests
No tests needed
bun run db:generate reports "No schema changes" (migration & snapshot in sync).
bun run type-check passes with no verify-related errors.
Migration SQL is idempotent and safe to re-run.

📝 Additional Information

Schema/migration only — no service, TRPC, or UI logic yet. Follow-up PRs will add criteria/rubric CRUD, plan instantiation + confirm, the verifier integrations (program / agent / llm), verify_status service-layer sync, feedback, and the frontend Plan-confirm UI (subtasks 2–11).

🤖 Generated with Claude Code

vercel · 2026-06-04T16:39:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lobehub	Ready	Preview, Comment	Jun 5, 2026 2:35am

sourcery-ai

We've reviewed this pull request using the Sourcery rules engine

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e4dd230cd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-04T16:42:51Z

    completionReason: text('completion_reason', { enum: completionReasons }),

+    /** Denormalized rollup of the verify (delivery checker) pipeline state. */
+    verifyStatus: text('verify_status', { enum: verifyStatuses }),


Set a default verify status for operations

When operations are recorded through AgentOperationModel.recordStart, the inserted values do not include verifyStatus, and this new column has no database default/backfill, so both existing rows and newly created operations remain NULL instead of the defined unverified state. In the operation-list/filter scenario described by this column’s comment, those rows will be omitted from status-based filters or require every caller to special-case null; add a default/backfill or set the field at insert time.

Useful? React with 👍 / 👎.

codecov · 2026-06-04T16:46:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.85%. Comparing base (65ba086) to head (e0f2199).

Additional details and impacted files

@@            Coverage Diff            @@
##           canary   #15480     +/-   ##
=========================================
  Coverage   70.85%   70.85%             
=========================================
  Files        3256     3256             
  Lines      321442   321442             
  Branches    29378    35044   +5666     
=========================================
  Hits       227754   227754             
  Misses      93506    93506             
  Partials      182      182

Flag	Coverage Δ
app	`61.55% <ø> (ø)`
database	`92.54% <ø> (ø)`
packages/agent-manager-runtime	`49.69% <ø> (ø)`
packages/agent-runtime	`81.08% <ø> (ø)`
packages/builtin-tool-lobe-agent	`18.52% <ø> (ø)`
packages/context-engine	`84.19% <ø> (ø)`
packages/conversation-flow	`91.29% <ø> (ø)`
packages/device-gateway-client	`90.51% <ø> (ø)`
packages/eval-dataset-parser	`95.15% <ø> (ø)`
packages/eval-rubric	`76.11% <ø> (ø)`
packages/fetch-sse	`85.57% <ø> (ø)`
packages/file-loaders	`87.89% <ø> (ø)`
packages/memory-user-memory	`74.99% <ø> (ø)`
packages/model-bank	`99.99% <ø> (ø)`
packages/model-runtime	`84.22% <ø> (ø)`
packages/prompts	`72.51% <ø> (ø)`
packages/python-interpreter	`92.90% <ø> (ø)`
packages/ssrf-safe-fetch	`0.00% <ø> (ø)`
packages/types	`35.38% <ø> (ø)`
packages/utils	`84.98% <ø> (ø)`
packages/web-crawler	`88.08% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Store	`68.53% <ø> (ø)`
Services	`54.77% <ø> (ø)`
Server	`71.85% <ø> (ø)`
Libs	`57.03% <ø> (ø)`
Utils	`81.71% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…ecker Implement the database layer for the Agent Run delivery checker (Verify System). Reuse / definition layer: - verify_criteria: a single reusable pass/fail standard (atomic unit), carrying its verifier config + onFail default and bound to a document for judging guidance (iteration history reuses document_history; no version columns) - verify_rubrics: a named group that aggregates criteria — the reusable unit - verify_rubric_criteria: junction, which criteria a rubric aggregates (criteria are reusable across rubrics) Mounted onto an agent via the existing agency config jsonb: - agencyConfig.verifyRubricId: a reusable rubric (criteria template) - agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria A run's plan instantiates the union of both. No dedicated bindings table. Snapshot + result layer: - agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns a new operation), instead of a separate plans table - agent_operations.verify_status: denormalized rollup for list-page badges - verify_check_results: per-criterion result with the Toulmin model (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1 verifier_tracing_id for batch judging, FP/FN flags for the data flywheel; relates to the plan via operation_id + stable check_item_id Ref: LOBE-10019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🗃️ feat(database): add verify system tables for agent run delivery checker Implement the database layer for the Agent Run delivery checker (Verify System). Reuse / definition layer: - verify_criteria: a single reusable pass/fail standard (atomic unit), carrying its verifier config + onFail default and bound to a document for judging guidance (iteration history reuses document_history; no version columns) - verify_rubrics: a named group that aggregates criteria — the reusable unit - verify_rubric_criteria: junction, which criteria a rubric aggregates (criteria are reusable across rubrics) Mounted onto an agent via the existing agency config jsonb: - agencyConfig.verifyRubricId: a reusable rubric (criteria template) - agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria A run's plan instantiates the union of both. No dedicated bindings table. Snapshot + result layer: - agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns a new operation), instead of a separate plans table - agent_operations.verify_status: denormalized rollup for list-page badges - verify_check_results: per-criterion result with the Toulmin model (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1 verifier_tracing_id for batch judging, FP/FN flags for the data flywheel; relates to the plan via operation_id + stable check_item_id Ref: LOBE-10019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ✨ feat(verify): add Agent Run delivery checker backend + frontend module Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * 🐛 fix(verify): persist delivery-checker verdicts via async tracing backfill The LLM judge produced valid verdicts but they were never persisted, leaving every run stuck at `verifying`. Two root causes: 1. FK ordering: `writeVerdict` stamped `verifier_tracing_id` synchronously, but the `llm_generation_tracing` row is written asynchronously (best-effort, after the response) — so the hard FK was violated every time and the verdict write was rolled back. Now the verdict is written with a null link, and the tracing id is backfilled by an `onPersisted` callback that fires only after the tracing row commits (still non-blocking). If tracing is disabled the link simply stays null. 2. Verdict parse: the judge JSON schema is non-strict, so the provider returns optional Toulmin fields as explicit `null`. The Zod validator used `.optional()` (accepts undefined, not null), so any null failed the whole `safeParse` and discarded the batch. Switched to `.nullish()`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): add `verify` command for the delivery checker Adds `lh verify` covering the full delivery-checker chain — criteria & rubric CRUD, per-run plan (generate/state/confirm/skip), execute (LLM judge), results, and feedback — calling the `verify` lambda router. Enables end-to-end backend testing of the verify system. Also adds the missing `tool-runtime` / `prompts` / `const` workspace entries to the CLI's `pnpm-workspace.yaml` so the standalone package installs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add verify message role + delivery-checker card UI Make the delivery-checker renderable in chat: - Fix the `features/Verify` components so they compile: flatten the `verify` locale to the repo's flat-dotted-key convention (keySeparator: false), import `Flexbox`/`TextArea` from `@lobehub/ui` (react-layout-kit is no longer a dep), and the token cast. - Add a `verify` UI message role + a `VerifyMessage` card that renders the Run Artifact + checker dock from `metadata.verifyOperationId`, wired into the message renderer switch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): add lobe-agent `generateVerifyPlan` tool (server runtime) Lets an agent set up the delivery checker for its run: the agent calls `generateVerifyPlan` early (per the new `<delivery_checker>` system-role guidance), which instantiates the rubric / ad-hoc criteria into a frozen plan on the current `agent_operations` row. Executed server-side only — the executor is dispatched via `runtime[apiName]` with `operationId` threaded through the tool execution context; the client `BaseExecutor` gracefully no-ops it. Also registers the metadata fields (`verifyOperationId`/`verifyRound`) on the message metadata zod schema so the role='verify' card can carry its operation id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): surface role=verify card on run completion (LOBE-10051) Connect the delivery checker to the conversation: when an Agent Run with a verify plan completes, `CompletionLifecycle` inserts a persisted `role='verify'` message (parented to the assistant, carrying `metadata.verifyOperationId`) that renders the checker card. Self-guarded — no plan → no card, failures never affect the run. `role='verify'` behaves like a `user` leaf message everywhere it flows (persistence + conversation-flow pass it through unchanged); only the context-engine treats it specially: a new `VerifyMessageProcessor` drops it from the model context (UI-only card, not a valid model role). Adds `verify` to `CreateMessageRoleType`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): merge run-artifact + checker into one card The role=verify message rendered two stacked cards (Run Artifact summary + Delivery Checker) that duplicated the check-item list. Merge into a single card: the `Run Artifact · Round N` header, then the checker results + actions, then the snapshot note. RunArtifact/CheckerDock gain an `embedded` prop (header-only / body-only, no card chrome) and VerifyMessage composes them under one border. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): derive generateVerifyPlan rubric from agencyConfig A real agent calls `generateVerifyPlan` with just a `goal` and doesn't know rubric ids. When `rubricId`/`criteriaIds` params are absent, derive the mounted rubric + ad-hoc criteria from the executing agent's `agencyConfig.verifyRubricId / verifyCriteriaIds`. Params still win when given. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(cli): surface agent gateway WebSocket close code + reason The `onclose` handler logged `String(event)` → the useless "[object CloseEvent]". Surface `event.code` (+ `event.reason` when present) so a gateway disconnect before completion is actually diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 fix(verify): rename "Run Artifact" → "Verification", drop failed red border - The kicker said "Run Artifact" — it's automated verification, not an artifact. Renamed to "Verification · Round N". - Removed the red error border on a failed check — a normal card reads better. - Fixes a render crash (`useVerifyState is not defined`): the border removal left a dangling reference after the import was dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): poll run status when the agent stream drops When the live stream (gateway WebSocket / SSE) closes before the run finishes, the run is still executing server-side — so instead of hard-exiting, fall back to polling `aiAgent.getOperationStatus` every 10s until the run reaches a terminal state (or is no longer tracked). Pairs with surfacing the WS close code/reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add Render for generateVerifyPlan tool call The generateVerifyPlan tool call rendered as the default param/result dump. Add a Render that lists the generated delivery checks (title + gate/auto-fill tag), and surface the items on the tool state so the Render can read them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-confirm generated plan so checks run on completion The agent generated a plan but it stayed `planned`/unconfirmed, so the completion hook (which gates on a confirmed plan) never ran the checks — the card was stuck at "awaiting confirmation" with no pass/fail. In the headless agent flow there's no one to click Confirm, so `generateVerifyPlan` now auto-confirms the plan it generates; the checks then run automatically on completion. (An interactive "review before run" gate is a future enhancement.) Also: the verify card header disappeared in the draft/planned phase (`phaseToArtifact.draft` was null). Give it a header so the card always shows its "Verification · Round N" heading. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(agent-tracing): only count opaque/presentational attrs as structural noise The first structuralNoiseRatio charged ALL markup (every <...> tag) as noise, which over-penalized legitimately structured results 3x. Grounding against real web-search output (`<item title="…" url="…">snippet</item>`) showed the tags and the title=/url= attributes ARE the signal the model reads. Now only opaque/presentational attribute names (id, class, style, data-*, aria-*, role, on*) count as noise; semantic element tags and content-bearing attributes (title, url, href, name…) are kept. On a 57-op user-interrupted sample this drops web-search noise 42%→0% and overall estimated waste 16%→5%, leaving large-payload (readDocument) and high error-rate tools as the real signal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): model-authored criteria with name/description/instruction-in-document + agent verifier Restructure the generateVerifyPlan tool to a createDocument-style full-create flow and wire up the agent verifier path: - criteria now = title + description (required one-liner) + instruction (required detailed rubric); instruction lives in a linked document (verify_criteria.documentId), description is a new verify_criteria column (migration 0111). verifierConfig no longer holds description/instruction. - generateVerifyPlan creates verify_criteria + a rubric, snapshots the plan onto the operation and confirms it; judge resolves the instruction from the document. - agent-type checks run as verifier sub-agents (execAgent + isolated thread) whose onComplete hook parses a VERDICT and writes it back to verify_check_results (renamed AgentVerifierSpawner → VerifierAgentRunner). - UI: custom Inspector for the tool header; check list shows per-verifier-type icons (llm/agent/program) + description + required/optional tag; i18n en/zh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ⚡️ perf(verify): run program/llm/agent checks concurrently on completion The three verifier kinds are independent; previously the agent spawn waited for the batched LLM judge to finish. Run them via Promise.all so agent sub-agents start immediately alongside the LLM batch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): dedicated builtin verify-agent + writeback tool, role=verify message, portal check editor - Add `@lobechat/builtin-tool-verify` (submitVerifyResult) + builtin `verify-agent`; agent-type checks now run as the dedicated verify agent (not the user's agent), which investigates and writes its verdict back via the tool during its run. - Verifier inherits the parent run's model/provider (builtin default may be unconfigured locally). - role=verify completion message no longer requires an assistantMessageId, so the delivery-checker card always surfaces when a plan exists. - Portal editor for verify checks (title/description/instruction/verifier/onFail). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(verify): restrict verify-agent to its writeback tool; fix running loader icon Root cause of stuck `running` agent checks: the verify-agent ran in agent mode and inherited all default tools (web-browsing, cloud-sandbox, skills, activator), so it went off web-searching/crawling to "investigate" and never called submitVerifyResult. - Run the verify-agent in chat mode (enableAgentMode: false, searchMode: off) — the strict whitelist — and whitelist `lobe-verify` for chat mode so the verifier gets ONLY its writeback tool. - Sharpen the verify systemRole: judge from the provided deliverable/instruction (no external tools), always reach a verdict, and always call submitVerifyResult. - CheckerDock: running check now uses the standard RingLoadingIcon (warning ring), matching the app's loader instead of a blue spinner. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-repair loop — re-run the agent with failure feedback on failed checks When required checks fail with onFail=auto_repair, automatically run a second iteration instead of ending at `failed`: - createRepairRunner: re-runs the SAME agent in the same topic with the failure feedback as the prompt, re-snapshots the plan onto the repair operation and confirms it so it re-verifies on completion (the next round). Capped at MAX_REPAIR_ROUNDS via parent-chain depth to prevent runaway loops. - maybeAutoRepair: fires only once every required check has a terminal result, so it works for inline LLM checks (triggered from lifecycle) and async agent checks (triggered from the verify tool's writeback path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): open check result detail in portal & rename artifact→result - add a VerifyResult portal view: clicking any check row opens that result's detail (verdict, confidence, Toulmin sections, suggestion) on the right; agent checks expose their execution trace from inside the panel - CheckerDock rows are all clickable now (chevron affordance), status shown by icon only; verify card uses colorBgElevated - rename the run-result surface from "artifact" to "result" everywhere: RunArtifact → RunResult, phaseToArtifact → phaseToResult, and all `artifact.*` i18n keys → `result.*` - ship verify namespace zh-CN / en-US locales Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): enrich check result portal — criterion stepper, richer detail view Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): rubric run-policy config + repair feedback on the verify card Auto-repair feedback now lives on the failed round's role=verify message (content), and the VerifyMessageProcessor surfaces it into the repair run's context as a tagged user turn — so the repair op runs off history via a new execAgent `suppressUserMessage` path instead of injecting a synthetic user message. createVerifyMessage is awaited before verification to avoid a race. maxRepairRounds becomes a rubric-level config: new `verify_rubrics.config` jsonb column, read live at repair time via the plan's sourceRubricId. Adds a RubricConfig portal panel (reachable from the plan card's settings affordance) to view/edit it, wired through the verify store + TRPC. Verify domain types/vocab/config are extracted from the DB schema into @lobechat/types as the single source of truth; schema and consumers import from there. Tests: VerifyMessageProcessor dual behavior; VerifyRubricModel config round-trip; MessageModel.findVerifyMessageByOperationId. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🗃️ refactor(verify): squash the 3 verify migrations into one Collapse 0110 (tables) + 0111 (criteria.description) + 0112 (rubrics.config) into a single regenerated 0110_add_verify_tables so the PR ships one clean, idempotent migration. No schema change vs the three combined. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): verify rubric run-policy config commands + shrink judging-rule editor font CLI: `verify rubric create --max-repair-rounds`, `verify rubric view`, and `verify rubric update` exercise the rubric config endpoints end-to-end; adds a mocked command test. UI: judging-rule editor font 16px → 14px. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): editable rubric name in the config panel + default 3 repair rounds Add a name (title) field to the RubricConfig portal, persisted via a new updateRubricTitle store action + service (optimistic + debounced, alongside the config write-back). Bump DEFAULT_MAX_REPAIR_ROUNDS 2 → 3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ♻️ refactor(verify): extract generateVerifyPlan into installable lobe-delivery-checker tool Move the delivery-checker plan-creation flow out of the always-on lobe-agent tool into a new standalone, installable builtin tool `lobe-delivery-checker` (Skill Store, opt-in per agent — not loaded by default). lobe-agent no longer ships generateVerifyPlan. - new packages/builtin-tool-lobe-delivery-checker (manifest/types/systemRole + client Render/Inspector/Portal moved wholesale from lobe-agent) - new serverRuntimes/lobeDeliveryChecker.ts (generateVerifyPlan moved out of lobeAgent.ts), registered alongside verifyResult - registered installable in builtin-tools (no hidden/discoverable:false, not in defaultToolIds/alwaysOnToolIds/runtimeManagedToolIds); renders/inspectors/ portals/identifiers wired; lobe-agent portal entries removed - i18n keys moved builtins.lobe-agent.verifyPlan.* → builtins.lobe-delivery-checker.* Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(agent): add `custom` tool mode; verify agent uses it instead of chat-mode Chat mode's contract is to strip ALL user/agent plugins (strict KB/memory/web allow-list) — so the verify sub-agent couldn't get its writeback tool without a leaky blanket rule. Introduce a third tool mode `custom` where the toolset is EXACTLY the agent's declared plugins (no always-on, no defaults, no activator), for focused builtin sub-agents. - chatConfig.toolMode: 'agent' | 'chat' | 'custom' (overrides enableAgentMode) - AgentToolsEngine: custom branch (defaultToolIds = plugins, rules = plugins-on, allowExplicitActivation only in agent mode); chatModeRules restored to strict - verify agent → toolMode: 'custom'; lobe-verify dropped from chatModeAllowedToolIds - test: custom mode enables exactly the declared plugin, no always-on / defaults Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arvinxx requested review from nekomeowww and tjx666 as code owners June 4, 2026 16:39

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 4, 2026

sourcery-ai Bot reviewed Jun 4, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 4, 2026

View reviewed changes

vercel Bot deployed to Preview June 4, 2026 16:56 View deployment

arvinxx force-pushed the feat/lobe-10019-verify-system-schema branch 4 times, most recently from 66f47c6 to d7bf2e2 Compare June 4, 2026 17:31

vercel Bot deployed to Preview June 4, 2026 18:02 View deployment

arvinxx force-pushed the feat/lobe-10019-verify-system-schema branch from d7bf2e2 to e0f2199 Compare June 5, 2026 02:24

vercel Bot deployed to Preview June 5, 2026 02:35 View deployment

arvinxx closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🗃️ feat(database): add verify system tables for agent run delivery checker#15480

🗃️ feat(database): add verify system tables for agent run delivery checker#15480
arvinxx wants to merge 1 commit into
canaryfrom
feat/lobe-10019-verify-system-schema

arvinxx commented Jun 4, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

arvinxx commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💻 Change Type

🔗 Related Issue

🔀 Description of Change

🧪 How to Test

📝 Additional Information

Uh oh!

vercel Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arvinxx commented Jun 4, 2026 •

edited

Loading

vercel Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading