fix(clp-mcp-server): Update stale dataset key to datasets in search job config (fixes #2061).#2062
Merged
Merged
Conversation
…ch job config (fixes y-scope#2061).
Contributor
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
WalkthroughThe change updates the job configuration payload submitted to MariaDB, restructuring the dataset field from a single string value to an array. The "dataset" key is replaced with "datasets" containing a list with the default dataset name, aligning with an updated API specification. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
sitaowang1998
approved these changes
Mar 3, 2026
junhaoliao
added a commit
to junhaoliao/clp
that referenced
this pull request
May 17, 2026
…ch job config (fixes y-scope#2061). (y-scope#2062)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
#1992 changed
SearchJobConfig.dataset(singularstr) toSearchJobConfig.datasets(plural
list[str]) to support multi-dataset queries. The MCP server'sClpConnector.submit_query()was not updated and still writes"dataset": "default"intothe msgpacked job config.
When the query scheduler deserializes this with
SearchJobConfig.model_validate(), Pydanticsilently ignores the unknown
"dataset"key and setsdatasets = None. On a clp-sdeployment, this causes the scheduler to fall through to the
datasets is Nonebranch at:clp/components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
Lines 1250 to 1252 in c48968d
which calls
_get_archives_for_search_without_datasets()— a codepath intended forclp-text (which has no dataset concept). On clp-s this may search across all datasets
rather than just the intended
"default"dataset.This fix updates the job config key from
"dataset"to"datasets"and wraps the valuein a list, matching the
SearchJobConfigschema.Checklist
breaking change.
Validation performed
1. Unit tests
Task: Verify all existing MCP server tests pass with the fix.
Command:
Output:
2. End-to-end MCP server search
Task: Verify that MCP server search queries work end-to-end with the fixed
datasetskey.Setup:
Output:
Test: Execute MCP Streamable HTTP protocol calls using
curl(initialize → get_instructions → search_by_kql).
Step 1: Initialize MCP session
Command:
Output (response headers + body):
Step 2: Call
get_instructionsCommand:
Output:
Step 3: Call
search_by_kqlCommand:
Output (truncated):
Explanation: The MCP server successfully submitted a search query, the query scheduler
processed it (scoped to the
"default"dataset), and 1000 results were returned across100 pages. Before this fix, the
"dataset"key was silently ignored by Pydantic,resulting in
datasets=None, which caused the scheduler to use the_get_archives_for_search_without_datasets()codepath instead of the dataset-scoped one.Summary by CodeRabbit