Skip to content

🐛 fix: should record unique case id in eval dataset#13129

Merged
arvinxx merged 2 commits into
lobehub:canaryfrom
cy948:fix/dataset-item-id
Mar 20, 2026
Merged

🐛 fix: should record unique case id in eval dataset#13129
arvinxx merged 2 commits into
lobehub:canaryfrom
cy948:fix/dataset-item-id

Conversation

@cy948

@cy948 cy948 commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

💻 Change Type

  • ✨ feat
  • 🐛 fix
  • ♻️ refactor
  • 💄 style
  • 👷 build
  • ⚡️ perf
  • ✅ test
  • 📝 docs
  • 🔨 chore

🔗 Related Issue

🔀 Description of Change

补齐 id 用于外部套件与内部 case 追踪

🧪 How to Test

  • Tested locally
  • Added/updated tests
  • No tests needed

📸 Screenshots / Videos

Before After
... ...

📝 Additional Information

Summary by Sourcery

New Features:

  • Add support for an optional case_id field across multiple eval dataset presets to carry case identifiers when present.

@vercel

vercel Bot commented Mar 19, 2026

Copy link
Copy Markdown

@cy948 is attempting to deploy a commit to the LobeHub OSS Team on Vercel.

A member of the Team first needs to authorize it.

@sourcery-ai

sourcery-ai Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR updates multiple dataset presets so that an optional case_id field is recognized and preserved during dataset ingestion, enabling external tools and internal case tracking to reliably associate records by ID.

Sequence diagram for dataset ingestion with optional case_id capture

sequenceDiagram
  actor User
  participant UI as DatasetUploadUI
  participant Importer as DatasetImporter
  participant Presets as DatasetPresetConfig
  participant Store as DatasetStorage
  participant Tracker as CaseTrackingService

  User->>UI: Upload dataset file
  UI->>Importer: sendFile(file)
  Importer->>Presets: getPreset(formatKey)
  Presets-->>Importer: DatasetPreset(requiredFields, optionalFields)
  Importer->>Importer: validate requiredFields
  Importer->>Importer: map fields including optional case_id
  alt row has case_id
    Importer->>Store: saveRecord(input, expected, case_id)
    Store-->>Importer: recordId
    Importer->>Tracker: linkCase(recordId, case_id)
  else row missing case_id
    Importer->>Store: saveRecord(input, expected)
    Store-->>Importer: recordId
  end
  Importer-->>UI: ImportResult(total, successes, failures)
Loading

Flow diagram for dataset presets recognizing optional case_id

flowchart LR
  A[Start dataset ingestion] --> B[Select dataset preset]
  B --> C{Preset supports
case_id in
optionalFields?}
  C -->|Yes| D[Check row
for case_id column]
  C -->|No| G[Ignore case_id
in incoming rows]
  D --> E{Row has
case_id value?}
  E -->|Yes| F[Attach case_id
to internal record
for tracking]
  E -->|No| H[Create record
without case_id]
  G --> H
  H --> I[Finish ingestion
with created records]
  F --> I
Loading

File-Level Changes

Change Details Files
Enable optional case ID tracking in general QA-style presets that previously had no optional fields.
  • Add case_id to the optionalFields array for the basic question/answer preset with required question, answer, problem_topic, canary.
  • Add case_id to the optionalFields for the preset with required Question, Answer and optional Topic, canary.
  • Add case_id to the optionalFields for the preset with required instance_id, query, evaluation, language.
  • Add case_id to the optionalFields for the preset with required problem, answer, problem_category, answer_type.
  • Add case_id to the optionalFields for the preset with required question, answer, topic, canary.
src/routes/(main)/eval/config/datasetPresets.ts
Extend classification-style presets to include optional case IDs alongside existing metadata fields.
  • Add case_id to the optionalFields for the preset with required question, answer, raw_subject, category and optional canary.
  • Add case_id to the optionalFields for the preset with required question, answer, raw_subject, category, Verified_Classes and optional canary.
src/routes/(main)/eval/config/datasetPresets.ts

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • For presets where id is now accepted, consider updating the corresponding formatDescription strings and any fieldInference/identifier mapping so the presence and role of id is clear and consistently handled across all presets.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- For presets where `id` is now accepted, consider updating the corresponding `formatDescription` strings and any `fieldInference`/identifier mapping so the presence and role of `id` is clear and consistently handled across all presets.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@lobehubbot

Copy link
Copy Markdown
Member

@ONLY-yours - This is a bug fix in the eval dataset presets config. Please take a look.

@codecov

codecov Bot commented Mar 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.56%. Comparing base (d3ea4a4) to head (04a8f89).
⚠️ Report is 12 commits behind head on canary.

Additional details and impacted files
@@            Coverage Diff             @@
##           canary   #13129      +/-   ##
==========================================
- Coverage   74.56%   74.56%   -0.01%     
==========================================
  Files        1516     1516              
  Lines      124941   124941              
  Branches    14539    16569    +2030     
==========================================
- Hits        93166    93164       -2     
- Misses      31664    31666       +2     
  Partials      111      111              
Flag Coverage Δ
app 67.67% <ø> (-0.01%) ⬇️
database 97.89% <ø> (ø)
packages/agent-runtime 89.60% <ø> (ø)
packages/context-engine 83.53% <ø> (ø)
packages/conversation-flow 92.37% <ø> (ø)
packages/file-loaders 87.02% <ø> (ø)
packages/memory-user-memory 66.68% <ø> (ø)
packages/model-bank 99.84% <ø> (ø)
packages/model-runtime 84.72% <ø> (ø)
packages/prompts 74.60% <ø> (ø)
packages/python-interpreter 92.90% <ø> (ø)
packages/ssrf-safe-fetch 0.00% <ø> (ø)
packages/utils 90.09% <ø> (ø)
packages/web-crawler 88.81% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Store 66.25% <ø> (ø)
Services 49.91% <ø> (ø)
Server 69.82% <ø> (-0.01%) ⬇️
Libs 42.20% <ø> (ø)
Utils 91.09% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cy948 cy948 force-pushed the fix/dataset-item-id branch from 4a446ec to 8661642 Compare March 20, 2026 05:02
@cy948 cy948 marked this pull request as draft March 20, 2026 05:16
@cy948 cy948 force-pushed the fix/dataset-item-id branch from 1cd92ff to 04a8f89 Compare March 20, 2026 05:22
@cy948 cy948 marked this pull request as ready for review March 20, 2026 05:22
@cy948 cy948 changed the title fix: should capture id if dataset has fix: should record unique case id in eval dataset Mar 20, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • If case_id is meant to be consumed downstream (e.g., used as a primary identifier), consider adding it to any relevant fieldInference mappings so it can be auto-detected even when column names vary (e.g., id, caseId).
  • Double-check that all dataset presets that should support external case tracking now include case_id in optionalFields to avoid inconsistent behavior between different preset types.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- If `case_id` is meant to be consumed downstream (e.g., used as a primary identifier), consider adding it to any relevant `fieldInference` mappings so it can be auto-detected even when column names vary (e.g., `id`, `caseId`).
- Double-check that all dataset presets that should support external case tracking now include `case_id` in `optionalFields` to avoid inconsistent behavior between different preset types.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@arvinxx arvinxx changed the title fix: should record unique case id in eval dataset 🐛 fix: should record unique case id in eval dataset Mar 20, 2026
@arvinxx arvinxx merged commit e577c95 into lobehub:canary Mar 20, 2026
22 of 23 checks passed
@lobehubbot

Copy link
Copy Markdown
Member

❤️ Great PR @cy948 ❤️

The growth of project is inseparable from user feedback and contribution, thanks for your contribution! If you are interesting with the lobehub developer community, please join our discord and then dm @arvinxx or @canisminor1990. They will invite you to our private developer channel. We are talking about the lobe-chat development or sharing ai newsletter around the world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants