🗃️ feat(database): add verify system tables for agent run delivery checker#15480
🗃️ feat(database): add verify system tables for agent run delivery checker#15480arvinxx wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e4dd230cd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| completionReason: text('completion_reason', { enum: completionReasons }), | ||
|
|
||
| /** Denormalized rollup of the verify (delivery checker) pipeline state. */ | ||
| verifyStatus: text('verify_status', { enum: verifyStatuses }), |
There was a problem hiding this comment.
Set a default verify status for operations
When operations are recorded through AgentOperationModel.recordStart, the inserted values do not include verifyStatus, and this new column has no database default/backfill, so both existing rows and newly created operations remain NULL instead of the defined unverified state. In the operation-list/filter scenario described by this column’s comment, those rows will be omitted from status-based filters or require every caller to special-case null; add a default/backfill or set the field at insert time.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## canary #15480 +/- ##
=========================================
Coverage 70.85% 70.85%
=========================================
Files 3256 3256
Lines 321442 321442
Branches 29378 35044 +5666
=========================================
Hits 227754 227754
Misses 93506 93506
Partials 182 182
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
66f47c6 to
d7bf2e2
Compare
…ecker Implement the database layer for the Agent Run delivery checker (Verify System). Reuse / definition layer: - verify_criteria: a single reusable pass/fail standard (atomic unit), carrying its verifier config + onFail default and bound to a document for judging guidance (iteration history reuses document_history; no version columns) - verify_rubrics: a named group that aggregates criteria — the reusable unit - verify_rubric_criteria: junction, which criteria a rubric aggregates (criteria are reusable across rubrics) Mounted onto an agent via the existing agency config jsonb: - agencyConfig.verifyRubricId: a reusable rubric (criteria template) - agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria A run's plan instantiates the union of both. No dedicated bindings table. Snapshot + result layer: - agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns a new operation), instead of a separate plans table - agent_operations.verify_status: denormalized rollup for list-page badges - verify_check_results: per-criterion result with the Toulmin model (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1 verifier_tracing_id for batch judging, FP/FN flags for the data flywheel; relates to the plan via operation_id + stable check_item_id Ref: LOBE-10019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d7bf2e2 to
e0f2199
Compare
Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* 🗃️ feat(database): add verify system tables for agent run delivery checker Implement the database layer for the Agent Run delivery checker (Verify System). Reuse / definition layer: - verify_criteria: a single reusable pass/fail standard (atomic unit), carrying its verifier config + onFail default and bound to a document for judging guidance (iteration history reuses document_history; no version columns) - verify_rubrics: a named group that aggregates criteria — the reusable unit - verify_rubric_criteria: junction, which criteria a rubric aggregates (criteria are reusable across rubrics) Mounted onto an agent via the existing agency config jsonb: - agencyConfig.verifyRubricId: a reusable rubric (criteria template) - agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria A run's plan instantiates the union of both. No dedicated bindings table. Snapshot + result layer: - agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns a new operation), instead of a separate plans table - agent_operations.verify_status: denormalized rollup for list-page badges - verify_check_results: per-criterion result with the Toulmin model (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1 verifier_tracing_id for batch judging, FP/FN flags for the data flywheel; relates to the plan via operation_id + stable check_item_id Ref: LOBE-10019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ✨ feat(verify): add Agent Run delivery checker backend + frontend module Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * 🐛 fix(verify): persist delivery-checker verdicts via async tracing backfill The LLM judge produced valid verdicts but they were never persisted, leaving every run stuck at `verifying`. Two root causes: 1. FK ordering: `writeVerdict` stamped `verifier_tracing_id` synchronously, but the `llm_generation_tracing` row is written asynchronously (best-effort, after the response) — so the hard FK was violated every time and the verdict write was rolled back. Now the verdict is written with a null link, and the tracing id is backfilled by an `onPersisted` callback that fires only after the tracing row commits (still non-blocking). If tracing is disabled the link simply stays null. 2. Verdict parse: the judge JSON schema is non-strict, so the provider returns optional Toulmin fields as explicit `null`. The Zod validator used `.optional()` (accepts undefined, not null), so any null failed the whole `safeParse` and discarded the batch. Switched to `.nullish()`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): add `verify` command for the delivery checker Adds `lh verify` covering the full delivery-checker chain — criteria & rubric CRUD, per-run plan (generate/state/confirm/skip), execute (LLM judge), results, and feedback — calling the `verify` lambda router. Enables end-to-end backend testing of the verify system. Also adds the missing `tool-runtime` / `prompts` / `const` workspace entries to the CLI's `pnpm-workspace.yaml` so the standalone package installs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add verify message role + delivery-checker card UI Make the delivery-checker renderable in chat: - Fix the `features/Verify` components so they compile: flatten the `verify` locale to the repo's flat-dotted-key convention (keySeparator: false), import `Flexbox`/`TextArea` from `@lobehub/ui` (react-layout-kit is no longer a dep), and the token cast. - Add a `verify` UI message role + a `VerifyMessage` card that renders the Run Artifact + checker dock from `metadata.verifyOperationId`, wired into the message renderer switch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): add lobe-agent `generateVerifyPlan` tool (server runtime) Lets an agent set up the delivery checker for its run: the agent calls `generateVerifyPlan` early (per the new `<delivery_checker>` system-role guidance), which instantiates the rubric / ad-hoc criteria into a frozen plan on the current `agent_operations` row. Executed server-side only — the executor is dispatched via `runtime[apiName]` with `operationId` threaded through the tool execution context; the client `BaseExecutor` gracefully no-ops it. Also registers the metadata fields (`verifyOperationId`/`verifyRound`) on the message metadata zod schema so the role='verify' card can carry its operation id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): surface role=verify card on run completion (LOBE-10051) Connect the delivery checker to the conversation: when an Agent Run with a verify plan completes, `CompletionLifecycle` inserts a persisted `role='verify'` message (parented to the assistant, carrying `metadata.verifyOperationId`) that renders the checker card. Self-guarded — no plan → no card, failures never affect the run. `role='verify'` behaves like a `user` leaf message everywhere it flows (persistence + conversation-flow pass it through unchanged); only the context-engine treats it specially: a new `VerifyMessageProcessor` drops it from the model context (UI-only card, not a valid model role). Adds `verify` to `CreateMessageRoleType`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): merge run-artifact + checker into one card The role=verify message rendered two stacked cards (Run Artifact summary + Delivery Checker) that duplicated the check-item list. Merge into a single card: the `Run Artifact · Round N` header, then the checker results + actions, then the snapshot note. RunArtifact/CheckerDock gain an `embedded` prop (header-only / body-only, no card chrome) and VerifyMessage composes them under one border. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): derive generateVerifyPlan rubric from agencyConfig A real agent calls `generateVerifyPlan` with just a `goal` and doesn't know rubric ids. When `rubricId`/`criteriaIds` params are absent, derive the mounted rubric + ad-hoc criteria from the executing agent's `agencyConfig.verifyRubricId / verifyCriteriaIds`. Params still win when given. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(cli): surface agent gateway WebSocket close code + reason The `onclose` handler logged `String(event)` → the useless "[object CloseEvent]". Surface `event.code` (+ `event.reason` when present) so a gateway disconnect before completion is actually diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 fix(verify): rename "Run Artifact" → "Verification", drop failed red border - The kicker said "Run Artifact" — it's automated verification, not an artifact. Renamed to "Verification · Round N". - Removed the red error border on a failed check — a normal card reads better. - Fixes a render crash (`useVerifyState is not defined`): the border removal left a dangling reference after the import was dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): poll run status when the agent stream drops When the live stream (gateway WebSocket / SSE) closes before the run finishes, the run is still executing server-side — so instead of hard-exiting, fall back to polling `aiAgent.getOperationStatus` every 10s until the run reaches a terminal state (or is no longer tracked). Pairs with surfacing the WS close code/reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add Render for generateVerifyPlan tool call The generateVerifyPlan tool call rendered as the default param/result dump. Add a Render that lists the generated delivery checks (title + gate/auto-fill tag), and surface the items on the tool state so the Render can read them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-confirm generated plan so checks run on completion The agent generated a plan but it stayed `planned`/unconfirmed, so the completion hook (which gates on a confirmed plan) never ran the checks — the card was stuck at "awaiting confirmation" with no pass/fail. In the headless agent flow there's no one to click Confirm, so `generateVerifyPlan` now auto-confirms the plan it generates; the checks then run automatically on completion. (An interactive "review before run" gate is a future enhancement.) Also: the verify card header disappeared in the draft/planned phase (`phaseToArtifact.draft` was null). Give it a header so the card always shows its "Verification · Round N" heading. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(agent-tracing): only count opaque/presentational attrs as structural noise The first structuralNoiseRatio charged ALL markup (every <...> tag) as noise, which over-penalized legitimately structured results 3x. Grounding against real web-search output (`<item title="…" url="…">snippet</item>`) showed the tags and the title=/url= attributes ARE the signal the model reads. Now only opaque/presentational attribute names (id, class, style, data-*, aria-*, role, on*) count as noise; semantic element tags and content-bearing attributes (title, url, href, name…) are kept. On a 57-op user-interrupted sample this drops web-search noise 42%→0% and overall estimated waste 16%→5%, leaving large-payload (readDocument) and high error-rate tools as the real signal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): model-authored criteria with name/description/instruction-in-document + agent verifier Restructure the generateVerifyPlan tool to a createDocument-style full-create flow and wire up the agent verifier path: - criteria now = title + description (required one-liner) + instruction (required detailed rubric); instruction lives in a linked document (verify_criteria.documentId), description is a new verify_criteria column (migration 0111). verifierConfig no longer holds description/instruction. - generateVerifyPlan creates verify_criteria + a rubric, snapshots the plan onto the operation and confirms it; judge resolves the instruction from the document. - agent-type checks run as verifier sub-agents (execAgent + isolated thread) whose onComplete hook parses a VERDICT and writes it back to verify_check_results (renamed AgentVerifierSpawner → VerifierAgentRunner). - UI: custom Inspector for the tool header; check list shows per-verifier-type icons (llm/agent/program) + description + required/optional tag; i18n en/zh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ⚡️ perf(verify): run program/llm/agent checks concurrently on completion The three verifier kinds are independent; previously the agent spawn waited for the batched LLM judge to finish. Run them via Promise.all so agent sub-agents start immediately alongside the LLM batch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): dedicated builtin verify-agent + writeback tool, role=verify message, portal check editor - Add `@lobechat/builtin-tool-verify` (submitVerifyResult) + builtin `verify-agent`; agent-type checks now run as the dedicated verify agent (not the user's agent), which investigates and writes its verdict back via the tool during its run. - Verifier inherits the parent run's model/provider (builtin default may be unconfigured locally). - role=verify completion message no longer requires an assistantMessageId, so the delivery-checker card always surfaces when a plan exists. - Portal editor for verify checks (title/description/instruction/verifier/onFail). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(verify): restrict verify-agent to its writeback tool; fix running loader icon Root cause of stuck `running` agent checks: the verify-agent ran in agent mode and inherited all default tools (web-browsing, cloud-sandbox, skills, activator), so it went off web-searching/crawling to "investigate" and never called submitVerifyResult. - Run the verify-agent in chat mode (enableAgentMode: false, searchMode: off) — the strict whitelist — and whitelist `lobe-verify` for chat mode so the verifier gets ONLY its writeback tool. - Sharpen the verify systemRole: judge from the provided deliverable/instruction (no external tools), always reach a verdict, and always call submitVerifyResult. - CheckerDock: running check now uses the standard RingLoadingIcon (warning ring), matching the app's loader instead of a blue spinner. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-repair loop — re-run the agent with failure feedback on failed checks When required checks fail with onFail=auto_repair, automatically run a second iteration instead of ending at `failed`: - createRepairRunner: re-runs the SAME agent in the same topic with the failure feedback as the prompt, re-snapshots the plan onto the repair operation and confirms it so it re-verifies on completion (the next round). Capped at MAX_REPAIR_ROUNDS via parent-chain depth to prevent runaway loops. - maybeAutoRepair: fires only once every required check has a terminal result, so it works for inline LLM checks (triggered from lifecycle) and async agent checks (triggered from the verify tool's writeback path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): open check result detail in portal & rename artifact→result - add a VerifyResult portal view: clicking any check row opens that result's detail (verdict, confidence, Toulmin sections, suggestion) on the right; agent checks expose their execution trace from inside the panel - CheckerDock rows are all clickable now (chevron affordance), status shown by icon only; verify card uses colorBgElevated - rename the run-result surface from "artifact" to "result" everywhere: RunArtifact → RunResult, phaseToArtifact → phaseToResult, and all `artifact.*` i18n keys → `result.*` - ship verify namespace zh-CN / en-US locales Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): enrich check result portal — criterion stepper, richer detail view Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): rubric run-policy config + repair feedback on the verify card Auto-repair feedback now lives on the failed round's role=verify message (content), and the VerifyMessageProcessor surfaces it into the repair run's context as a tagged user turn — so the repair op runs off history via a new execAgent `suppressUserMessage` path instead of injecting a synthetic user message. createVerifyMessage is awaited before verification to avoid a race. maxRepairRounds becomes a rubric-level config: new `verify_rubrics.config` jsonb column, read live at repair time via the plan's sourceRubricId. Adds a RubricConfig portal panel (reachable from the plan card's settings affordance) to view/edit it, wired through the verify store + TRPC. Verify domain types/vocab/config are extracted from the DB schema into @lobechat/types as the single source of truth; schema and consumers import from there. Tests: VerifyMessageProcessor dual behavior; VerifyRubricModel config round-trip; MessageModel.findVerifyMessageByOperationId. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🗃️ refactor(verify): squash the 3 verify migrations into one Collapse 0110 (tables) + 0111 (criteria.description) + 0112 (rubrics.config) into a single regenerated 0110_add_verify_tables so the PR ships one clean, idempotent migration. No schema change vs the three combined. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): verify rubric run-policy config commands + shrink judging-rule editor font CLI: `verify rubric create --max-repair-rounds`, `verify rubric view`, and `verify rubric update` exercise the rubric config endpoints end-to-end; adds a mocked command test. UI: judging-rule editor font 16px → 14px. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): editable rubric name in the config panel + default 3 repair rounds Add a name (title) field to the RubricConfig portal, persisted via a new updateRubricTitle store action + service (optimistic + debounced, alongside the config write-back). Bump DEFAULT_MAX_REPAIR_ROUNDS 2 → 3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ♻️ refactor(verify): extract generateVerifyPlan into installable lobe-delivery-checker tool Move the delivery-checker plan-creation flow out of the always-on lobe-agent tool into a new standalone, installable builtin tool `lobe-delivery-checker` (Skill Store, opt-in per agent — not loaded by default). lobe-agent no longer ships generateVerifyPlan. - new packages/builtin-tool-lobe-delivery-checker (manifest/types/systemRole + client Render/Inspector/Portal moved wholesale from lobe-agent) - new serverRuntimes/lobeDeliveryChecker.ts (generateVerifyPlan moved out of lobeAgent.ts), registered alongside verifyResult - registered installable in builtin-tools (no hidden/discoverable:false, not in defaultToolIds/alwaysOnToolIds/runtimeManagedToolIds); renders/inspectors/ portals/identifiers wired; lobe-agent portal entries removed - i18n keys moved builtins.lobe-agent.verifyPlan.* → builtins.lobe-delivery-checker.* Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(agent): add `custom` tool mode; verify agent uses it instead of chat-mode Chat mode's contract is to strip ALL user/agent plugins (strict KB/memory/web allow-list) — so the verify sub-agent couldn't get its writeback tool without a leaky blanket rule. Introduce a third tool mode `custom` where the toolset is EXACTLY the agent's declared plugins (no always-on, no defaults, no activator), for focused builtin sub-agents. - chatConfig.toolMode: 'agent' | 'chat' | 'custom' (overrides enableAgentMode) - AgentToolsEngine: custom branch (defaultToolIds = plugins, rules = plugins-on, allowExplicitActivation only in agent mode); chatModeRules restored to strict - verify agent → toolMode: 'custom'; lobe-verify dropped from chatModeAllowedToolIds - test: custom mode enables exactly the declared plugin, no always-on / defaults Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
💻 Change Type
🔗 Related Issue
Part of LOBE-10019 — 实现 Agent Run 交付检查器 (Verify System) 后端业务逻辑
This PR delivers subtask 1: 数据库迁移 + Schema 定义 — the foundational database layer the remaining service/integration subtasks build on. Scoped to schema + migration so it merges first (lower layer before its callers).
🔀 Description of Change
Adds the database layer for the Agent Run delivery checker ("Verify System").
Reuse / definition layer (the part worth persisting & reusing):
verify_criteria— a single reusable pass/fail standard (the atomic unit). Carries itsverifier_type/verifier_config/on_faildefaults and binds to adocumentfor judging guidance (iteration history reusesdocument_history; no version columns).verify_rubrics— a named group that aggregates criteria — the reusable unit (a rubric is a set of criteria).verify_rubric_criteria— junction: which criteria a rubric aggregates (criteria are reusable across rubrics, withsort_order).Mounted onto an agent via the existing agency-config jsonb (no dedicated bindings table):
agencyConfig.verifyRubricId— a reusable rubric (criteria template).agencyConfig.verifyCriteriaIds— ad-hoc one-off criteria that don't warrant a rubric.A run's plan instantiates the union of the rubric's criteria + the ad-hoc criteria. Topic-level mounting is intentionally out of scope for now.
Snapshot + result layer:
agent_operations.verify_plan(jsonb) +verify_plan_confirmed_at— the per-run immutable check-item snapshot lives on the operation (1:1 —auto_repairspawns a new operation, so there's never >1 plan per op). No separate plans table. Each item carries a stable uuid +sourceCriterionId/sourceRubricIdprovenance, with resolved title/config frozen in (editing a criterion never drifts a historical plan).agent_operations.verify_status— denormalized rollup (unverified → planned → verifying → passed/failed → repairing → delivered) so the operation list page renders badges / filters without joining the verify tables.verify_check_results— per-criterion result, Toulmin-modeled:verdict(Claim) +confidence(Qualifier) are columns; narrative lives in a typedtoulminjsonb;verifier_tracing_idis N:1 (a batchgenerateObjectshares one tracing row);verifier_operation_idlinks an agent verifier's sub-operation;repair_operation_idlinks an auto-repair operation;is_false_positive/is_false_negativeprecomputed for the data flywheel. Relates to the plan viaoperation_id+ stablecheck_item_id(never the array index).user_idis kept on every table for ownership / access-controlWHERE user_id = ?. Redundantagent_id/topic_idwere intentionally not added to results (reachable viaoperation_id).Migration:
packages/database/migrations/0110_add_verify_tables.sql(idempotent —IF NOT EXISTS/DROP CONSTRAINT IF EXISTS+ADD). The agent mount fields need no agents-table migration sinceagency_configis already jsonb.🧪 How to Test
Tested locally
Added/updated tests
No tests needed
bun run db:generatereports "No schema changes" (migration & snapshot in sync).bun run type-checkpasses with no verify-related errors.Migration SQL is idempotent and safe to re-run.
📝 Additional Information
Schema/migration only — no service, TRPC, or UI logic yet. Follow-up PRs will add criteria/rubric CRUD, plan instantiation + confirm, the verifier integrations (program / agent / llm),
verify_statusservice-layer sync, feedback, and the frontend Plan-confirm UI (subtasks 2–11).🤖 Generated with Claude Code