Skip to content

fix(webui): Cap compression metadata query results to prevent OOM crash (fixes #2040).#2041

Merged
junhaoliao merged 3 commits into
y-scope:mainfrom
junhaoliao:webui-oom
Feb 26, 2026
Merged

fix(webui): Cap compression metadata query results to prevent OOM crash (fixes #2040).#2041
junhaoliao merged 3 commits into
y-scope:mainfrom
junhaoliao:webui-oom

Conversation

@junhaoliao

@junhaoliao junhaoliao commented Feb 26, 2026

Copy link
Copy Markdown
Member

Description

Add a LIMIT 1000 clause and remove the WHERE ... INTERVAL 30 DAY filter
from the compression metadata SQL query to prevent the webui Node.js server
from running out of memory when the compression_jobs table contains a large
number of rows.

Previously, getCompressionMetadataQuery() fetched all compression jobs
updated within the last 30 days with no row limit:

const getCompressionMetadataQuery = () => `
SELECT
id as ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.ID},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.STATUS},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.STATUS_MSG},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.START_TIME},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.DURATION},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UNCOMPRESSED_SIZE},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.COMPRESSED_SIZE},
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.CLP_CONFIG}
FROM ${settings.SqlDbCompressionJobsTableName}
WHERE ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= NOW() - INTERVAL 30 DAY
ORDER BY ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.ID} DESC;
`;

When a user navigates to the Ingest page, the client calls
GET /api/compress-metadata, which executes this query, loads every row into
memory, and brotli-decompresses + msgpack-decodes the clp_config VARBINARY
blob for each row. With a large number of jobs (e.g., 5 million), this exhausts
the V8 heap (~4 GiB default) and crashes the server with:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

The fix:

  1. Introduces COMPRESSION_METADATA_QUERY_LIMIT = 1000 and appends
    LIMIT 1000 to the query.
  2. Removes the WHERE update_time >= NOW() - INTERVAL 30 DAY clause since the
    LIMIT already caps the result set and the time filter is redundant when
    ordering by id DESC.

Since the query orders by id DESC, the 1000 most recent jobs are returned
regardless of their update time.

Impact Assessment

  • Affected component: webui server -- compression metadata API route.
  • Root cause: Unbounded SELECT query on the compression_jobs table with
    no LIMIT clause.
  • Fix: Cap the result set to 1000 rows and remove the redundant time filter.
  • Scope: Only one query is affected. All other SQL queries in the webui
    server were audited -- they are either aggregation queries (returning 1 row),
    filtered by primary key, or return small metadata (dataset names, column
    names).
  • Risk: Low. The ORDER BY id DESC already exists, so the 1000 most recent
    jobs are always returned. Users with more than 1000 jobs will see only the
    latest 1000 in the Ingest page's Compression Jobs table.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

1. Build the CLP package

Task: Build the CLP package (including the Docker image) with the fix
applied.

Command:

task package

Output (last lines):

task: [package] rsync --archive "tools/deployment/package/" "/home/junhao/workspace/2-clp/build/clp-package"
task: [package] rsync --archive "/home/junhao/workspace/2-clp/build/clp-package-image.id" "/home/junhao/workspace/2-clp/build/clp-package"
task: [package] echo '0.9.1-dev' > '/home/junhao/workspace/2-clp/build/clp-package/VERSION'

2. Verify the LIMIT clause is present and the WHERE clause is absent in the Docker image

Task: Confirm the compiled JavaScript inside the freshly built Docker image
includes the LIMIT clause and does not include the WHERE time filter.

Command:

IMAGE_ID=$(cat build/clp-package/clp-package-image.id)
docker run --rm --entrypoint cat "$IMAGE_ID" \
    /opt/clp/var/www/webui/server/dist/src/routes/api/compress-metadata/sql.js

Output (relevant portion):

const COMPRESSION_METADATA_QUERY_LIMIT = 1000;
const getCompressionMetadataQuery = () => `
    SELECT
        ...
    FROM ${settings.SqlDbCompressionJobsTableName}
    ORDER BY ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.ID} DESC
    LIMIT ${COMPRESSION_METADATA_QUERY_LIMIT};
`;

Explanation: The WHERE ... INTERVAL 30 DAY clause is absent and
LIMIT ${COMPRESSION_METADATA_QUERY_LIMIT} (1000) is present.

3. Start CLP and insert 5,000,000 compression job records

Task: Reproduce the conditions that trigger the OOM crash.

Commands:

cd build/clp-package
./sbin/start-clp.sh

Output (last line):

2026-02-26T20:38:52.630 INFO [controller] Started CLP.

Then insert 5,000,000 rows with status = 2 (SUCCEEDED) and valid
brotli-compressed msgpack clp_config blobs (matching ClpIoPartialConfigSchema)
into compression_jobs:

# (abbreviated -- batch inserts of 10,000 rows at a time)
total = 5000000
# ...
cursor.executemany(insert_sql, batch_data)

Output:

Inserted 500000/5000000 rows (11.6s)
...
Inserted 5000000/5000000 rows (105.7s)
Done: inserted 5000000 rows in 105.7s
Total rows in compression_jobs: 5000000

4. Verify GET /api/compress-metadata/ returns 1000 rows without crash

Task: Confirm the API returns a bounded result set and the server stays
healthy.

Command:

curl -s -o /tmp/resp.json -w "%{http_code}" http://127.0.0.1:4000/api/compress-metadata/
python3 -c "import json; data=json.load(open('/tmp/resp.json')); print(len(data))"
docker stats --no-stream --format "{{.MemUsage}}" clp-package-04a1-webui-1

Output:

200
1000
67.7MiB / 58.64GiB

Explanation: The API returned HTTP 200 with exactly 1000 rows (the LIMIT).
Memory usage stayed at ~68 MiB. No FATAL ERROR in the container logs.

5. Verify the Ingest page loads successfully

Task: Confirm the Ingest page loads without crashing the server.

Command:

curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:4000/ingest

Output:

200

Explanation: The Ingest page returned HTTP 200. The webui server remained
healthy throughout with no OOM errors in the container logs:

docker logs clp-package-04a1-webui-1 2>&1 | grep -i "FATAL\|heap out of memory\|OOM"
# (no output -- no OOM errors)

6. Confirm the OOM crash is reproducible on main (before the fix)

Task: Verify that the same 5,000,000-row scenario crashes the server on the
unfixed main branch.

Output (before fix, same 5M rows, no NODE_OPTIONS hack):

[1:0x19c7b000]   242039 ms: Mark-Compact 4021.4 (4129.2) -> 4005.8 (4129.4) MB, pooled: 3 MB,
    868.66 / 0.76 ms  (average mu = 0.093, current mu = 0.040) allocation failure;
    scavenge might not succeed
[1:0x19c7b000]   242929 ms: Mark-Compact 4021.8 (4129.7) -> 4006.2 (4129.9) MB, pooled: 2 MB,
    853.99 / 0.83 ms  (average mu = 0.068, current mu = 0.040) allocation failure;
    scavenge might not succeed

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Explanation: Without the fix, the V8 heap climbed to ~4 GiB (the default
limit) and crashed after ~4 minutes. Memory growth was observed at
2.3 -> 3.6 -> 4.0 -> 4.6 -> 5.0 -> 5.4 GiB (container RSS) before the fatal
error.

@junhaoliao junhaoliao requested a review from a team as a code owner February 26, 2026 16:19
@junhaoliao junhaoliao requested a review from hoophalab February 26, 2026 16:19
@coderabbitai

coderabbitai Bot commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

A new module-scoped constant COMPRESSION_METADATA_QUERY_LIMIT set to 1000 is introduced, and the SQL builder getCompressionMetadataQuery is modified to remove the 30-day UPDATE_TIME WHERE filter and instead apply a LIMIT of 1000 after the ORDER BY clause.

Changes

Cohort / File(s) Summary
Query Limit & Query Adjustment
components/webui/server/src/routes/api/compress-metadata/sql.ts
Added COMPRESSION_METADATA_QUERY_LIMIT = 1000. Removed the 30-day UPDATE_TIME WHERE filter from getCompressionMetadataQuery; moved query termination to place LIMIT COMPRESSION_METADATA_QUERY_LIMIT after ORDER BY id DESC.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: capping compression metadata query results to prevent OOM crashes, directly addressing the changeset's primary purpose of adding a LIMIT to the SQL query.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@junhaoliao junhaoliao added this to the February 2026 milestone Feb 26, 2026
@@ -36,7 +38,8 @@ const getCompressionMetadataQuery = () => `
${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.CLP_CONFIG}
FROM ${settings.SqlDbCompressionJobsTableName}
WHERE ${COMPRESSION_JOBS_TABLE_COLUMN_NAMES.UPDATE_TIME} >= NOW() - INTERVAL 30 DAY

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on removing this 30 day check? actually, is there a reason we previously prefer it over the numeric count?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There probably isn't a real reason. It's an arbitrary decision from the initial commit.

hoophalab
hoophalab previously approved these changes Feb 26, 2026
@junhaoliao junhaoliao merged commit ad6e7b5 into y-scope:main Feb 26, 2026
23 checks passed
@junhaoliao junhaoliao deleted the webui-oom branch May 7, 2026 19:46
junhaoliao added a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants