Skip to content

bug(clp-mcp-server): Search queries use stale dataset key instead of datasets after #1992 #2061

@junhaoliao

Description

@junhaoliao

Bug

The MCP server's ClpConnector.submit_query() writes the search job config with "dataset": "default" (singular string), but #1992 renamed this field to "datasets" (plural, list[str]) in SearchJobConfig. Since Pydantic's model_validate silently ignores unknown keys, the datasets field is deserialized as None, causing MCP-initiated searches to run without any dataset filter instead of scoping to the "default" dataset.

job_config = msgpack.packb(
{
"begin_timestamp": begin_ts,
"dataset": CLP_DEFAULT_DATASET_NAME,
"end_timestamp": end_ts,
"ignore_case": True,
"max_num_results": SEARCH_MAX_NUM_RESULTS,
"query_string": query,
}

The job config should use "datasets": ["default"] to match the updated SearchJobConfig schema:

class SearchJobConfig(QueryJobConfig):
datasets: list[str] | None = None
query_string: str
max_num_results: int
begin_timestamp: int | None = None
end_timestamp: int | None = None
ignore_case: bool = False

The test at line 146 also asserts the old field name in the link URL parameter, but the link URL (&dataset=default) is actually correct since it's a webui URL param (not the API field), so only the submit_query job config needs updating.

Expected: submit_query should use "datasets": ["default"] so that the query scheduler correctly scopes searches to the default dataset on clp-s deployments.

CLP version

c48968d (main, introduced in #1992)

Environment

  • Ubuntu 22.04.5 LTS
  • Docker 29.2.1
  • CLP package version 0.9.1-dev

Reproduction steps

  1. Start CLP with the default clp-s config
  2. Compress data: ./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl
  3. Use the MCP server to submit a search query
  4. Inspect the query_jobs table — the job_config blob will contain "dataset" (singular) instead of "datasets" (plural list)
  5. The query scheduler deserializes datasets as None, meaning the search may not be scoped to the intended dataset

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions