fix(clp-mcp-server): Update stale `dataset` key to `datasets` in search job config (fixes #2061). by junhaoliao · Pull Request #2062 · y-scope/clp

junhaoliao · 2026-03-03T20:15:00Z

Description

#1992 changed SearchJobConfig.dataset (singular str) to SearchJobConfig.datasets
(plural list[str]) to support multi-dataset queries. The MCP server's
ClpConnector.submit_query() was not updated and still writes "dataset": "default" into
the msgpacked job config.

When the query scheduler deserializes this with SearchJobConfig.model_validate(), Pydantic
silently ignores the unknown "dataset" key and sets datasets = None. On a clp-s
deployment, this causes the scheduler to fall through to the datasets is None branch at:

clp/components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py

Lines 1250 to 1252 in c48968d

    
           if datasets is None: 
        
               # CLP-Text does not support datasets. 
        
               archives_for_search = _get_archives_for_search_without_datasets(

which calls _get_archives_for_search_without_datasets() — a codepath intended for
clp-text (which has no dataset concept). On clp-s this may search across all datasets
rather than just the intended "default" dataset.

This fix updates the job config key from "dataset" to "datasets" and wraps the value
in a list, matching the SearchJobConfig schema.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Unit tests

Task: Verify all existing MCP server tests pass with the fix.

Command:

$ cd components/clp-mcp-server
$ uv run pytest tests/ -v

Output:

tests/server/test_utils.py::TestUtils::test_convert_date_string_to_epoch PASSED
tests/server/test_utils.py::TestUtils::test_convert_date_string_to_epoch_invalid_date_string PASSED
tests/server/test_utils.py::TestUtils::test_parse_timestamp_range_invalid_values PASSED
tests/server/test_utils.py::TestUtils::test_invalid_timestamp_type PASSED
tests/server/test_utils.py::TestUtils::test_invalid_timestamp_value PASSED
tests/server/test_utils.py::TestUtils::test_missing_timestamp_and_message PASSED
tests/server/test_utils.py::TestUtils::test_sort_and_format_query_results PASSED
tests/test_clp_connector.py::test_submit_query SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_read_job_status SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_wait_query_completion SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_read_results SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_submit_query_invalid_timestamps PASSED
tests/test_clp_connector.py::test_read_job_status_not_found PASSED
tests/test_clp_connector.py::test_wait_query_completion_succeeded PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.FAILED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.CANCELLED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.KILLED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[999-RuntimeError] PASSED
tests/test_clp_connector.py::test_read_results_returns_docs PASSED
tests/test_clp_connector.py::test_read_results_adds_link_field PASSED
tests/test_session_manager.py::TestPaginatedQueryResult::test_get_page PASSED
tests/test_session_manager.py::TestPaginatedQueryResult::test_query_result_initialization PASSED
tests/test_session_manager.py::TestSessionState::test_error_handling PASSED
tests/test_session_manager.py::TestSessionState::test_get_page_data PASSED
tests/test_session_manager.py::TestSessionState::test_session_expiration PASSED
tests/test_session_manager.py::TestSessionManager::test_get_or_create_session PASSED
tests/test_session_manager.py::TestSessionManager::test_cached_query_result PASSED
tests/test_session_manager.py::TestSessionManager::test_get_nth_page PASSED
tests/test_session_manager.py::TestSessionManager::test_async_expiration_for_cleanup_loop PASSED

======================== 25 passed, 4 skipped in 2.17s =========================

2. End-to-end MCP server search

Task: Verify that MCP server search queries work end-to-end with the fixed
datasets key.

Setup:

$ cd build/clp-package
$ ./sbin/start-clp.sh   # with mcp_server enabled in clp-config.yaml
$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl

Output:

2026-03-03T20:10:18.555 INFO [controller] Started CLP.

2026-03-03T20:10:23.537 INFO [compress] Compression job 1 submitted.
2026-03-03T20:10:26.042 INFO [compress] Compression finished.
2026-03-03T20:10:26.042 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 176.62MB/s.

Test: Execute MCP Streamable HTTP protocol calls using curl
(initialize → get_instructions → search_by_kql).

Step 1: Initialize MCP session

Command:

$ curl -s -D /dev/stderr \
    -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -d '{
      "jsonrpc": "2.0",
      "id": 1,
      "method": "initialize",
      "params": {
        "protocolVersion": "2025-03-26",
        "capabilities": {},
        "clientInfo": {"name": "curl-test", "version": "1.0"}
      }
    }'

Output (response headers + body):

HTTP/1.1 200 OK
content-type: text/event-stream
mcp-session-id: 0f38ce5376a54db1a8cb6701fa71e489
...

event: message
data: {"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-03-26",
  "capabilities":{"tools":{"listChanged":true},...},
  "serverInfo":{"name":"clp-mcp-server","version":"3.1.0"}}}

Step 2: Call `get_instructions`

Command:

$ SESSION_ID="0f38ce5376a54db1a8cb6701fa71e489"
$ curl -s -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -H "Mcp-Session-Id: $SESSION_ID" \
    -d '{
      "jsonrpc": "2.0",
      "id": 2,
      "method": "tools/call",
      "params": {"name": "get_instructions", "arguments": {}}
    }'

Output:

event: message
data: {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text",
  "text":"You are an AI assistant for querying the CLP log database using CLP-KQL (CKQL). ..."}],
  "isError":false}}

Step 3: Call `search_by_kql`

Command:

$ curl -s -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -H "Mcp-Session-Id: $SESSION_ID" \
    -d '{
      "jsonrpc": "2.0",
      "id": 3,
      "method": "tools/call",
      "params": {"name": "search_by_kql", "arguments": {"kql_query": "*"}}
    }'

Output (truncated):

event: message
data: {"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"{
  \"items\":[
    \"timestamp: 2023-03-27T00:32:15.936Z, message: {\\\"timestamp\\\":\\\"2023-03-27 00:32:15.936\\\", ...},
      link: http://localhost:4000/streamFile?type=json&streamId=...&dataset=default&logEventIdx=999993\",
    ...
  ],
  \"num_total_pages\":100,
  \"num_total_items\":1000,
  \"num_items_per_page\":10,
  \"has_next\":true,
  \"has_previous\":false}"}],
  "isError":false}}

Explanation: The MCP server successfully submitted a search query, the query scheduler
processed it (scoped to the "default" dataset), and 1000 results were returned across
100 pages. Before this fix, the "dataset" key was silently ignored by Pydantic,
resulting in datasets=None, which caused the scheduler to use the
_get_archives_for_search_without_datasets() codepath instead of the dataset-scoped one.

Summary by CodeRabbit

Refactor
- Improved internal data structure handling for job configuration submissions to enhance system compatibility.

…ch job config (fixes y-scope#2061).

coderabbitai · 2026-03-03T20:15:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48968d and 1143983.

📒 Files selected for processing (1)

components/clp-mcp-server/clp_mcp_server/clp_connector.py

Walkthrough

The change updates the job configuration payload submitted to MariaDB, restructuring the dataset field from a single string value to an array. The "dataset" key is replaced with "datasets" containing a list with the default dataset name, aligning with an updated API specification.

Changes

Cohort / File(s)	Summary
Job Configuration Payload Structure `components/clp-mcp-server/clp_mcp_server/clp_connector.py`	Updated job configuration to send `"datasets": [CLP_DEFAULT_DATASET_NAME]` instead of `"dataset": CLP_DEFAULT_DATASET_NAME`, changing the payload field from singular to plural array format.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related issues

bug(clp-mcp-server): Search queries use stale dataset key instead of datasets after #1992 #2061: Directly addresses the same job-config field update, replacing the "dataset" key with "datasets" as an array to match the updated API specification.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: updating a stale `dataset` key to `datasets` in the search job configuration, which is the primary modification shown in the raw summary.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ch job config (fixes y-scope#2061). (y-scope#2062)

fix(clp-mcp-server): Update stale dataset key to datasets in sear…

1143983

…ch job config (fixes y-scope#2061).

junhaoliao requested a review from a team as a code owner March 3, 2026 20:15

junhaoliao requested a review from hoophalab March 3, 2026 20:15

sitaowang1998 approved these changes Mar 3, 2026

View reviewed changes

junhaoliao merged commit fab4b8d into y-scope:main Mar 3, 2026
23 checks passed

junhaoliao added this to the February 2026 milestone Mar 7, 2026

junhaoliao deleted the mcp-dataset branch May 7, 2026 19:46

junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026

fix(clp-mcp-server): Update stale dataset key to datasets in sear…

a5c555a

…ch job config (fixes y-scope#2061). (y-scope#2062)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(clp-mcp-server): Update stale `dataset` key to `datasets` in search job config (fixes #2061).#2062

fix(clp-mcp-server): Update stale `dataset` key to `datasets` in search job config (fixes #2061).#2062
junhaoliao merged 1 commit into
y-scope:mainfrom
junhaoliao:mcp-dataset

junhaoliao commented Mar 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if datasets is None:
	# CLP-Text does not support datasets.
	archives_for_search = _get_archives_for_search_without_datasets(

Conversation

junhaoliao commented Mar 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

1. Unit tests

2. End-to-end MCP server search

Step 1: Initialize MCP session

Step 2: Call get_instructions

Step 3: Call search_by_kql

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junhaoliao commented Mar 3, 2026 •

edited by coderabbitai Bot

Loading

Step 2: Call `get_instructions`

Step 3: Call `search_by_kql`

coderabbitai Bot commented Mar 3, 2026 •

edited

Loading