Skip to content

Silent data truncation in orchestration DB causes compression scheduler crash — VARBINARY(60000) too small for clp_config blobs #2151

@goynam

Description

@goynam

Bug

The compression_jobs.clp_config, compression_tasks.clp_paths_to_compress, and query_jobs.job_config columns are defined as VARBINARY(60000) in
initialize-orchestration-db.py
. When a brotli-compressed msgpack config exceeds 60,000 bytes, MySQL/MariaDB silently truncates the data on INSERT. The truncated blob is a valid-length binary value but an incomplete brotli stream, causing brotli.decompress() to fail with brotli.error: decoder failed.

###Proposed Fix

  1. Change column types from VARBINARY(60000) to MEDIUMBLOB (16MB max) in initialize-orchestration-db.py. MEDIUMBLOB adds only 1 byte of storage overhead per row (3-byte length prefix vs 2-byte) and is compatible with all existing application code — Python mysql-connector, Node.js mysql2, and Rust sqlx all handle blob types identically to varbinary.

  2. Add error handling in the compression scheduler's job processing loop so a single corrupted job doesn't crash the entire scheduler. The bad job should be marked as FAILED and skipped:

for job_row in jobs:
job_id = job_row["id"]
try:
clp_io_config = ClpIoConfig.model_validate(
msgpack.unpackb(brotli.decompress(job_row["clp_config"]))
)
except Exception:
logger.error("Failed to decode clp_config for job %s, marking as failed.", job_id)
# mark job as FAILED and continue
continue

  1. Same resilience fix in the WebUI's mapCompressionMetadataRows — skip rows with corrupted blobs instead of crashing the entire endpoint.

CLP version

0.10.0

Environment

eks ( should occur in any env)

Reproduction steps

This occurs in high-throughput S3 ingestion scenarios where the log-ingestor's Buffer accumulates many S3 object metadata IDs before flushing. With buffer_flush_threshold set to 4GB (default) and many small S3 objects (1-10MB each), a single compression job can reference thousands of object IDs. The resulting ClpIoConfig serialized as msgpack and compressed with brotli exceeds 60,000 bytes.

In our environment (4 TB/hr ingestion, ~100 compression workers), we observed 24 out of ~1,700 pending jobs with exactly 60,000-byte clp_config blobs — all failing to decompress. Valid jobs in the same table had configs ranging from 2,000-5,500 bytes.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions