Skip to content

fix(clp-mcp-server): Update stale dataset key to datasets in search job config (fixes #2061).#2062

Merged
junhaoliao merged 1 commit into
y-scope:mainfrom
junhaoliao:mcp-dataset
Mar 3, 2026
Merged

fix(clp-mcp-server): Update stale dataset key to datasets in search job config (fixes #2061).#2062
junhaoliao merged 1 commit into
y-scope:mainfrom
junhaoliao:mcp-dataset

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Mar 3, 2026

Copy link
Copy Markdown
Member

Description

#1992 changed SearchJobConfig.dataset (singular str) to SearchJobConfig.datasets
(plural list[str]) to support multi-dataset queries. The MCP server's
ClpConnector.submit_query() was not updated and still writes "dataset": "default" into
the msgpacked job config.

When the query scheduler deserializes this with SearchJobConfig.model_validate(), Pydantic
silently ignores the unknown "dataset" key and sets datasets = None. On a clp-s
deployment, this causes the scheduler to fall through to the datasets is None branch at:

if datasets is None:
# CLP-Text does not support datasets.
archives_for_search = _get_archives_for_search_without_datasets(

which calls _get_archives_for_search_without_datasets() — a codepath intended for
clp-text (which has no dataset concept). On clp-s this may search across all datasets
rather than just the intended "default" dataset.

This fix updates the job config key from "dataset" to "datasets" and wraps the value
in a list, matching the SearchJobConfig schema.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Unit tests

Task: Verify all existing MCP server tests pass with the fix.

Command:

$ cd components/clp-mcp-server
$ uv run pytest tests/ -v

Output:

tests/server/test_utils.py::TestUtils::test_convert_date_string_to_epoch PASSED
tests/server/test_utils.py::TestUtils::test_convert_date_string_to_epoch_invalid_date_string PASSED
tests/server/test_utils.py::TestUtils::test_parse_timestamp_range_invalid_values PASSED
tests/server/test_utils.py::TestUtils::test_invalid_timestamp_type PASSED
tests/server/test_utils.py::TestUtils::test_invalid_timestamp_value PASSED
tests/server/test_utils.py::TestUtils::test_missing_timestamp_and_message PASSED
tests/server/test_utils.py::TestUtils::test_sort_and_format_query_results PASSED
tests/test_clp_connector.py::test_submit_query SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_read_job_status SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_wait_query_completion SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_read_results SKIPPED (requires actual DB connections)
tests/test_clp_connector.py::test_submit_query_invalid_timestamps PASSED
tests/test_clp_connector.py::test_read_job_status_not_found PASSED
tests/test_clp_connector.py::test_wait_query_completion_succeeded PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.FAILED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.CANCELLED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[QueryJobStatus.KILLED-RuntimeError] PASSED
tests/test_clp_connector.py::test_wait_query_completion_failure_cases[999-RuntimeError] PASSED
tests/test_clp_connector.py::test_read_results_returns_docs PASSED
tests/test_clp_connector.py::test_read_results_adds_link_field PASSED
tests/test_session_manager.py::TestPaginatedQueryResult::test_get_page PASSED
tests/test_session_manager.py::TestPaginatedQueryResult::test_query_result_initialization PASSED
tests/test_session_manager.py::TestSessionState::test_error_handling PASSED
tests/test_session_manager.py::TestSessionState::test_get_page_data PASSED
tests/test_session_manager.py::TestSessionState::test_session_expiration PASSED
tests/test_session_manager.py::TestSessionManager::test_get_or_create_session PASSED
tests/test_session_manager.py::TestSessionManager::test_cached_query_result PASSED
tests/test_session_manager.py::TestSessionManager::test_get_nth_page PASSED
tests/test_session_manager.py::TestSessionManager::test_async_expiration_for_cleanup_loop PASSED

======================== 25 passed, 4 skipped in 2.17s =========================

2. End-to-end MCP server search

Task: Verify that MCP server search queries work end-to-end with the fixed
datasets key.

Setup:

$ cd build/clp-package
$ ./sbin/start-clp.sh   # with mcp_server enabled in clp-config.yaml
$ ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl

Output:

2026-03-03T20:10:18.555 INFO [controller] Started CLP.

2026-03-03T20:10:23.537 INFO [compress] Compression job 1 submitted.
2026-03-03T20:10:26.042 INFO [compress] Compression finished.
2026-03-03T20:10:26.042 INFO [compress] Compressed 385.21MB into 10.06MB (38.31x). Speed: 176.62MB/s.

Test: Execute MCP Streamable HTTP protocol calls using curl
(initialize → get_instructions → search_by_kql).

Step 1: Initialize MCP session

Command:

$ curl -s -D /dev/stderr \
    -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -d '{
      "jsonrpc": "2.0",
      "id": 1,
      "method": "initialize",
      "params": {
        "protocolVersion": "2025-03-26",
        "capabilities": {},
        "clientInfo": {"name": "curl-test", "version": "1.0"}
      }
    }'

Output (response headers + body):

HTTP/1.1 200 OK
content-type: text/event-stream
mcp-session-id: 0f38ce5376a54db1a8cb6701fa71e489
...

event: message
data: {"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-03-26",
  "capabilities":{"tools":{"listChanged":true},...},
  "serverInfo":{"name":"clp-mcp-server","version":"3.1.0"}}}

Step 2: Call get_instructions

Command:

$ SESSION_ID="0f38ce5376a54db1a8cb6701fa71e489"
$ curl -s -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -H "Mcp-Session-Id: $SESSION_ID" \
    -d '{
      "jsonrpc": "2.0",
      "id": 2,
      "method": "tools/call",
      "params": {"name": "get_instructions", "arguments": {}}
    }'

Output:

event: message
data: {"jsonrpc":"2.0","id":2,"result":{"content":[{"type":"text",
  "text":"You are an AI assistant for querying the CLP log database using CLP-KQL (CKQL). ..."}],
  "isError":false}}

Step 3: Call search_by_kql

Command:

$ curl -s -X POST http://localhost:8000/mcp \
    -H "Content-Type: application/json" \
    -H "Accept: application/json, text/event-stream" \
    -H "Mcp-Session-Id: $SESSION_ID" \
    -d '{
      "jsonrpc": "2.0",
      "id": 3,
      "method": "tools/call",
      "params": {"name": "search_by_kql", "arguments": {"kql_query": "*"}}
    }'

Output (truncated):

event: message
data: {"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"{
  \"items\":[
    \"timestamp: 2023-03-27T00:32:15.936Z, message: {\\\"timestamp\\\":\\\"2023-03-27 00:32:15.936\\\", ...},
      link: http://localhost:4000/streamFile?type=json&streamId=...&dataset=default&logEventIdx=999993\",
    ...
  ],
  \"num_total_pages\":100,
  \"num_total_items\":1000,
  \"num_items_per_page\":10,
  \"has_next\":true,
  \"has_previous\":false}"}],
  "isError":false}}

Explanation: The MCP server successfully submitted a search query, the query scheduler
processed it (scoped to the "default" dataset), and 1000 results were returned across
100 pages. Before this fix, the "dataset" key was silently ignored by Pydantic,
resulting in datasets=None, which caused the scheduler to use the
_get_archives_for_search_without_datasets() codepath instead of the dataset-scoped one.

Summary by CodeRabbit

  • Refactor
    • Improved internal data structure handling for job configuration submissions to enhance system compatibility.

@junhaoliao junhaoliao requested a review from a team as a code owner March 3, 2026 20:15
@junhaoliao junhaoliao requested a review from hoophalab March 3, 2026 20:15
@coderabbitai

coderabbitai Bot commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c48968d and 1143983.

📒 Files selected for processing (1)
  • components/clp-mcp-server/clp_mcp_server/clp_connector.py

Walkthrough

The change updates the job configuration payload submitted to MariaDB, restructuring the dataset field from a single string value to an array. The "dataset" key is replaced with "datasets" containing a list with the default dataset name, aligning with an updated API specification.

Changes

Cohort / File(s) Summary
Job Configuration Payload Structure
components/clp-mcp-server/clp_mcp_server/clp_connector.py
Updated job configuration to send "datasets": [CLP_DEFAULT_DATASET_NAME] instead of "dataset": CLP_DEFAULT_DATASET_NAME, changing the payload field from singular to plural array format.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related issues

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating a stale dataset key to datasets in the search job configuration, which is the primary modification shown in the raw summary.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao merged commit fab4b8d into y-scope:main Mar 3, 2026
23 checks passed
@junhaoliao junhaoliao added this to the February 2026 milestone Mar 7, 2026
@junhaoliao junhaoliao deleted the mcp-dataset branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants