Bug
When searching archives containing large log events (>16 MiB per event), the ResultsCacheOutputHandler fails to write those results to MongoDB because they exceed MongoDB's 16 MiB BSON document size limit. The insert_many call throws a mongocxx::exception, and the handler returns ErrorCodeFailureDbBulkWrite / ErrorCode_Failure_DB_Bulk_Write. The callers do log the error:
clp engine: clo.cpp:565-568 logs "Failed to flush output handler, error={}"
clp_s engine: Output.cpp:124-127 (per-table flush) and Output.cpp:133-136 (finish) log the same message.
However, the issues are:
- If a large document is batched with smaller documents, the entire
insert_many batch fails, losing valid results alongside the oversized one.
- The error message only includes the numeric error code -- it does not indicate which specific result caused the failure or that the root cause is MongoDB's BSON document size limit, making debugging difficult.
- After a batch failure, the handler returns immediately, discarding any remaining results that have not yet been flushed.
Each search result is stored as a single BSON document that embeds the full decompressed log event message. If a single log event exceeds ~16 MiB (the exact threshold is 16 MiB minus the BSON overhead for metadata fields like orig_file_path, timestamp, archive_id, log_event_ix, and dataset), the MongoDB insert will fail.
Affected paths:
This affects any search path where results are written to MongoDB:
-
WebUI search path -- The WebUI submits jobs directly to the DB via QueryJobDbManager and does not set write_to_file in its SearchJobConfig (see components/webui/server/src/routes/api/search/index.ts:75-84), so write_to_file defaults to false (components/job-orchestration/job_orchestration/scheduler/job_config.py:115), which routes results through the ResultsCacheOutputHandler (MongoDB).
-
API server with buffer_results_in_mongodb: true -- Separately, the API server accepts a buffer_results_in_mongodb flag in the query config (components/api-server/src/client.rs:48-52). When set to true, it maps to write_to_file: false (client.rs:64), again routing results to MongoDB. By default (buffer_results_in_mongodb: false), results go to files and are not affected.
Affected code paths (both clp and clp_s engines):
components/core/src/clp/clo/OutputHandler.cpp - ResultsCacheOutputHandler::flush() (line 98-152)
components/core/src/clp_s/OutputHandlerImpl.cpp - ResultsCacheOutputHandler::flush() (line 91-149)
Size limit chain:
| Limit |
Value |
Source |
| MongoDB BSON document max |
16 MiB (16,777,216 bytes) |
MongoDB protocol hard limit |
| Default max JSON record (ingestion) |
512 MiB |
CommandLineArguments.hpp:210 |
| simdjson hard max |
~4 GiB (0xFFFFFFFF bytes) |
simdjson/base.h:22 |
Since the ingestion limit (512 MiB default) far exceeds the MongoDB document limit (16 MiB), it is possible to ingest log events that can never be returned via any search path that uses MongoDB for results.
CLP version
3b4d13f
Environment
Any environment using either:
- The WebUI search path (always uses MongoDB results cache), or
- The API server with
buffer_results_in_mongodb: true.
Reproduction steps
- Prepare a JSONL file containing at least one JSON record larger than 16 MiB (e.g., a single JSON object with a 20 MiB string field).
- Ingest the file using
clp-s with default settings (the 512 MiB --max-document-size allows the record to be ingested).
- Search for a query that matches the large log event using the WebUI, or via the API server with
buffer_results_in_mongodb: true.
- Observe that the search result for the large event does not appear in the results. The search worker logs should show a bulk write failure.
Bug
When searching archives containing large log events (>16 MiB per event), the
ResultsCacheOutputHandlerfails to write those results to MongoDB because they exceed MongoDB's 16 MiB BSON document size limit. Theinsert_manycall throws amongocxx::exception, and the handler returnsErrorCodeFailureDbBulkWrite/ErrorCode_Failure_DB_Bulk_Write. The callers do log the error:clpengine:clo.cpp:565-568logs"Failed to flush output handler, error={}"clp_sengine:Output.cpp:124-127(per-table flush) andOutput.cpp:133-136(finish) log the same message.However, the issues are:
insert_manybatch fails, losing valid results alongside the oversized one.Each search result is stored as a single BSON document that embeds the full decompressed log event message. If a single log event exceeds ~16 MiB (the exact threshold is 16 MiB minus the BSON overhead for metadata fields like
orig_file_path,timestamp,archive_id,log_event_ix, anddataset), the MongoDB insert will fail.Affected paths:
This affects any search path where results are written to MongoDB:
WebUI search path -- The WebUI submits jobs directly to the DB via
QueryJobDbManagerand does not setwrite_to_filein itsSearchJobConfig(seecomponents/webui/server/src/routes/api/search/index.ts:75-84), sowrite_to_filedefaults tofalse(components/job-orchestration/job_orchestration/scheduler/job_config.py:115), which routes results through theResultsCacheOutputHandler(MongoDB).API server with
buffer_results_in_mongodb: true-- Separately, the API server accepts abuffer_results_in_mongodbflag in the query config (components/api-server/src/client.rs:48-52). When set totrue, it maps towrite_to_file: false(client.rs:64), again routing results to MongoDB. By default (buffer_results_in_mongodb: false), results go to files and are not affected.Affected code paths (both
clpandclp_sengines):components/core/src/clp/clo/OutputHandler.cpp-ResultsCacheOutputHandler::flush()(line 98-152)components/core/src/clp_s/OutputHandlerImpl.cpp-ResultsCacheOutputHandler::flush()(line 91-149)Size limit chain:
CommandLineArguments.hpp:210simdjson/base.h:22Since the ingestion limit (512 MiB default) far exceeds the MongoDB document limit (16 MiB), it is possible to ingest log events that can never be returned via any search path that uses MongoDB for results.
CLP version
3b4d13f
Environment
Any environment using either:
buffer_results_in_mongodb: true.Reproduction steps
clp-swith default settings (the 512 MiB--max-document-sizeallows the record to be ingested).buffer_results_in_mongodb: true.