fix(webui): Cap compression metadata query results to prevent OOM crash (fixes #2040).#2041
Conversation
WalkthroughA new module-scoped constant Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| @@ -36,7 +38,8 @@ const getCompressionMetadataQuery = () => ` | |||
| ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.CLP_CONFIG} | |||
| FROM ${settings.SqlDbCompressionJobsTableName} | |||
| WHERE ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= NOW() - INTERVAL 30 DAY | |||
There was a problem hiding this comment.
thoughts on removing this 30 day check? actually, is there a reason we previously prefer it over the numeric count?
There was a problem hiding this comment.
There probably isn't a real reason. It's an arbitrary decision from the initial commit.
Description
Add a
LIMIT 1000clause and remove theWHERE ... INTERVAL 30 DAYfilterfrom the compression metadata SQL query to prevent the webui Node.js server
from running out of memory when the
compression_jobstable contains a largenumber of rows.
Previously,
getCompressionMetadataQuery()fetched all compression jobsupdated within the last 30 days with no row limit:
clp/components/webui/server/src/routes/api/compress-metadata/sql.ts
Lines 26 to 40 in 8efbadd
When a user navigates to the Ingest page, the client calls
GET /api/compress-metadata, which executes this query, loads every row intomemory, and brotli-decompresses + msgpack-decodes the
clp_configVARBINARYblob for each row. With a large number of jobs (e.g., 5 million), this exhausts
the V8 heap (~4 GiB default) and crashes the server with:
The fix:
COMPRESSION_METADATA_QUERY_LIMIT = 1000and appendsLIMIT 1000to the query.WHERE update_time >= NOW() - INTERVAL 30 DAYclause since theLIMITalready caps the result set and the time filter is redundant whenordering by
id DESC.Since the query orders by
id DESC, the 1000 most recent jobs are returnedregardless of their update time.
Impact Assessment
webuiserver -- compression metadata API route.SELECTquery on thecompression_jobstable withno
LIMITclause.server were audited -- they are either aggregation queries (returning 1 row),
filtered by primary key, or return small metadata (dataset names, column
names).
ORDER BY id DESCalready exists, so the 1000 most recentjobs are always returned. Users with more than 1000 jobs will see only the
latest 1000 in the Ingest page's Compression Jobs table.
Checklist
breaking change.
Validation performed
1. Build the CLP package
Task: Build the CLP package (including the Docker image) with the fix
applied.
Command:
Output (last lines):
2. Verify the LIMIT clause is present and the WHERE clause is absent in the Docker image
Task: Confirm the compiled JavaScript inside the freshly built Docker image
includes the LIMIT clause and does not include the WHERE time filter.
Command:
Output (relevant portion):
Explanation: The
WHERE ... INTERVAL 30 DAYclause is absent andLIMIT ${COMPRESSION_METADATA_QUERY_LIMIT}(1000) is present.3. Start CLP and insert 5,000,000 compression job records
Task: Reproduce the conditions that trigger the OOM crash.
Commands:
cd build/clp-package ./sbin/start-clp.shOutput (last line):
Then insert 5,000,000 rows with
status = 2(SUCCEEDED) and validbrotli-compressed msgpack
clp_configblobs (matchingClpIoPartialConfigSchema)into
compression_jobs:Output:
4. Verify
GET /api/compress-metadata/returns 1000 rows without crashTask: Confirm the API returns a bounded result set and the server stays
healthy.
Command:
Output:
Explanation: The API returned HTTP 200 with exactly 1000 rows (the LIMIT).
Memory usage stayed at ~68 MiB. No
FATAL ERRORin the container logs.5. Verify the Ingest page loads successfully
Task: Confirm the Ingest page loads without crashing the server.
Command:
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:4000/ingestOutput:
Explanation: The Ingest page returned HTTP 200. The webui server remained
healthy throughout with no OOM errors in the container logs:
6. Confirm the OOM crash is reproducible on
main(before the fix)Task: Verify that the same 5,000,000-row scenario crashes the server on the
unfixed
mainbranch.Output (before fix, same 5M rows, no
NODE_OPTIONShack):Explanation: Without the fix, the V8 heap climbed to ~4 GiB (the default
limit) and crashed after ~4 minutes. Memory growth was observed at
2.3 -> 3.6 -> 4.0 -> 4.6 -> 5.0 -> 5.4 GiB (container RSS) before the fatal
error.