Bug
The MCP server's ClpConnector.submit_query() writes the search job config with "dataset": "default" (singular string), but #1992 renamed this field to "datasets" (plural, list[str]) in SearchJobConfig. Since Pydantic's model_validate silently ignores unknown keys, the datasets field is deserialized as None, causing MCP-initiated searches to run without any dataset filter instead of scoping to the "default" dataset.
|
job_config = msgpack.packb( |
|
{ |
|
"begin_timestamp": begin_ts, |
|
"dataset": CLP_DEFAULT_DATASET_NAME, |
|
"end_timestamp": end_ts, |
|
"ignore_case": True, |
|
"max_num_results": SEARCH_MAX_NUM_RESULTS, |
|
"query_string": query, |
|
} |
The job config should use "datasets": ["default"] to match the updated SearchJobConfig schema:
|
class SearchJobConfig(QueryJobConfig): |
|
datasets: list[str] | None = None |
|
query_string: str |
|
max_num_results: int |
|
begin_timestamp: int | None = None |
|
end_timestamp: int | None = None |
|
ignore_case: bool = False |
The test at line 146 also asserts the old field name in the link URL parameter, but the link URL (&dataset=default) is actually correct since it's a webui URL param (not the API field), so only the submit_query job config needs updating.
Expected: submit_query should use "datasets": ["default"] so that the query scheduler correctly scopes searches to the default dataset on clp-s deployments.
CLP version
c48968d (main, introduced in #1992)
Environment
- Ubuntu 22.04.5 LTS
- Docker 29.2.1
- CLP package version 0.9.1-dev
Reproduction steps
- Start CLP with the default clp-s config
- Compress data:
./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonl
- Use the MCP server to submit a search query
- Inspect the
query_jobs table — the job_config blob will contain "dataset" (singular) instead of "datasets" (plural list)
- The query scheduler deserializes
datasets as None, meaning the search may not be scoped to the intended dataset
Bug
The MCP server's
ClpConnector.submit_query()writes the search job config with"dataset": "default"(singular string), but #1992 renamed this field to"datasets"(plural,list[str]) inSearchJobConfig. Since Pydantic'smodel_validatesilently ignores unknown keys, thedatasetsfield is deserialized asNone, causing MCP-initiated searches to run without any dataset filter instead of scoping to the"default"dataset.clp/components/clp-mcp-server/clp_mcp_server/clp_connector.py
Lines 60 to 68 in c48968d
The job config should use
"datasets": ["default"]to match the updatedSearchJobConfigschema:clp/components/job-orchestration/job_orchestration/scheduler/job_config.py
Lines 85 to 91 in c48968d
The test at line 146 also asserts the old field name in the link URL parameter, but the link URL (
&dataset=default) is actually correct since it's a webui URL param (not the API field), so only thesubmit_queryjob config needs updating.Expected:
submit_queryshould use"datasets": ["default"]so that the query scheduler correctly scopes searches to the default dataset on clp-s deployments.CLP version
c48968d (main, introduced in #1992)
Environment
Reproduction steps
./sbin/compress.sh --timestamp-key timestamp ~/samples/postgresql.jsonlquery_jobstable — thejob_configblob will contain"dataset"(singular) instead of"datasets"(plural list)datasetsasNone, meaning the search may not be scoped to the intended dataset