Reduce binary size: byte-plane (stream-split) compression for region pair indices by Boshen · Pull Request #690 · oxc-project/oxc-browserslist

Boshen · 2026-05-29T10:42:20Z

Summary

Applies byte-plane (stream-split) compression to the region (browser, version) pair-index blob — the single largest bundled data blob.

Each region's pair indices are u16 values, but there are only ~557 distinct pairs, so the high byte is almost always 0. Instead of postcard-varint-encoding interleaved u16s per region, the codegen now writes all the low bytes, then all the high bytes, then deflates. The high-byte plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did.

The reader splits the decompressed blob at len / 2 and recombines lo | hi << 8 — no postcard deserialization for this blob anymore. PAIR_RANGES switches from byte offsets to element offsets (cumulative datum counts).

Results (lossless)

	Before	After	Δ
pair-index blob	47,602	44,567	−3,035
Linux musl example binary	782,048	778,528	−3,520

The binary shrinks slightly more than the blob because the u16 postcard decode path is dropped. macOS file size is unchanged (16 KB page quantization swallows the sub-page win), but the .rodata is 3 KB smaller — which helps consumers that link this crate into a larger binary.

Why this is the remaining clean win

An entropy analysis across ~15 candidate encodings showed:

The percentages blob (feat: use codegen instead of build.rs #2) already sits at its order-0 entropy floor — delta/raw/byte-plane/columnar/first-value-split all fail to beat the current delta-varint.
The pair-index byte-plane (44,567) is already below the order-0 symbol entropy floor (48,944) because deflate exploits cross-region repetition, so an order-0 arithmetic coder would be worse.
MTF, delta+zigzag, and columnar transpose all hurt.

Beating the current state would require an order-1+ model (BWT / range coder / brotli) = real decoder code that a few KB of savings won't pay for.

Verification: all 392 tests + 14 JS-fuzz proptests pass; clippy and fmt clean; every other generated blob is byte-identical (reproducible codegen).

🤖 Generated with Claude Code

codspeed-hq · 2026-05-29T10:43:07Z

Merging this PR will not alter performance

✅ 6 untouched benchmarks

_{Comparing byte-plane-region-pair-indices (2f93cce) with main (266518d)¹}

No successful run was found on main (9d315b9) during the generation of this report, so 266518d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

codecov · 2026-05-29T10:44:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (266518d) to head (2f93cce).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #690   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          47       47           
  Lines        2411     2413    +2     
=======================================
+ Hits         2388     2390    +2     
  Misses         23       23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The region (browser, version) pair-index blob is the single largest bundled blob. It stored one u16 pair index per datum as postcard varints concatenated per region, then deflated. Split the indices into two byte planes instead — every low byte first, then every high byte — before deflating. There are only ~557 distinct pairs, so the high byte is almost always 0; that plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did. The decode path now does plain `lo | hi << 8` arithmetic (no postcard deserialization for this blob), and PAIR_RANGES holds element offsets (cumulative datum counts) rather than byte offsets. - blob: 47,602 -> 44,567 bytes (-3,035) - Linux musl example binary: 782,048 -> 778,528 bytes (-3,520) - macOS unchanged (16 KB page quantization) Lossless: all tests pass, including the proptests that fuzz region coverage against the JS browserslist. Every other generated blob is byte-identical.

## 🤖 New release * `oxc-browserslist`: 3.0.3 -> 3.0.4 (✓ API compatible changes) <details><summary>Changelog</summary> <blockquote> ## [3.0.4](oxc-browserslist-v3.0.3...oxc-browserslist-v3.0.4) - 2026-05-29 ### Other - DRY up feature/region codegen with shared table + lookup helpers ([#694](#694)) - Consolidate bundled-data loading behind compression helpers ([#693](#693)) - Reduce binary size: run-length encode feature support versions ([#692](#692)) - Reduce binary size: Zopfli codegen compression + percentage byte-plane ([#691](#691)) - Reduce binary size: byte-plane (stream-split) compression for region pair indices ([#690](#690)) - Update README binary size to 621K ([#689](#689)) - Reduce binary size of bundled caniuse/electron data ([#688](#688)) - Switch codegen data source from caniuse-db to caniuse-lite ([#687](#687)) - Update browserslist ([#685](#685)) - Update rust crates ([#682](#682)) - Update browserslist ([#679](#679)) - Update browserslist ([#678](#678)) - Update browserslist ([#677](#677)) - Update browserslist ([#673](#673)) - Update browserslist ([#671](#671)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: oxc-guard[bot] <276638029+oxc-guard[bot]@users.noreply.github.com>

Boshen force-pushed the byte-plane-region-pair-indices branch from 104f48e to 2f93cce Compare May 29, 2026 10:47

Boshen merged commit 82e51cd into main May 29, 2026
16 checks passed

Boshen deleted the byte-plane-region-pair-indices branch May 29, 2026 11:01

oxc-guard Bot mentioned this pull request May 29, 2026

chore: release v3.0.4 #672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce binary size: byte-plane (stream-split) compression for region pair indices#690

Reduce binary size: byte-plane (stream-split) compression for region pair indices#690
Boshen merged 1 commit into
mainfrom
byte-plane-region-pair-indices

Boshen commented May 29, 2026

Uh oh!

codspeed-hq Bot commented May 29, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Boshen commented May 29, 2026

Summary

Results (lossless)

Why this is the remaining clean win

Uh oh!

codspeed-hq Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

codecov Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codspeed-hq Bot commented May 29, 2026 •

edited

Loading

codecov Bot commented May 29, 2026 •

edited

Loading