Reduce binary size: byte-plane (stream-split) compression for region pair indices#690
Merged
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #690 +/- ##
=======================================
Coverage 99.04% 99.04%
=======================================
Files 47 47
Lines 2411 2413 +2
=======================================
+ Hits 2388 2390 +2
Misses 23 23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The region (browser, version) pair-index blob is the single largest bundled blob. It stored one u16 pair index per datum as postcard varints concatenated per region, then deflated. Split the indices into two byte planes instead — every low byte first, then every high byte — before deflating. There are only ~557 distinct pairs, so the high byte is almost always 0; that plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did. The decode path now does plain `lo | hi << 8` arithmetic (no postcard deserialization for this blob), and PAIR_RANGES holds element offsets (cumulative datum counts) rather than byte offsets. - blob: 47,602 -> 44,567 bytes (-3,035) - Linux musl example binary: 782,048 -> 778,528 bytes (-3,520) - macOS unchanged (16 KB page quantization) Lossless: all tests pass, including the proptests that fuzz region coverage against the JS browserslist. Every other generated blob is byte-identical.
104f48e to
2f93cce
Compare
Merged
Boshen
pushed a commit
that referenced
this pull request
May 29, 2026
## 🤖 New release * `oxc-browserslist`: 3.0.3 -> 3.0.4 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [3.0.4](oxc-browserslist-v3.0.3...oxc-browserslist-v3.0.4) - 2026-05-29 ### Other - DRY up feature/region codegen with shared table + lookup helpers ([#694](#694)) - Consolidate bundled-data loading behind compression helpers ([#693](#693)) - Reduce binary size: run-length encode feature support versions ([#692](#692)) - Reduce binary size: Zopfli codegen compression + percentage byte-plane ([#691](#691)) - Reduce binary size: byte-plane (stream-split) compression for region pair indices ([#690](#690)) - Update README binary size to 621K ([#689](#689)) - Reduce binary size of bundled caniuse/electron data ([#688](#688)) - Switch codegen data source from caniuse-db to caniuse-lite ([#687](#687)) - Update browserslist ([#685](#685)) - Update rust crates ([#682](#682)) - Update browserslist ([#679](#679)) - Update browserslist ([#678](#678)) - Update browserslist ([#677](#677)) - Update browserslist ([#673](#673)) - Update browserslist ([#671](#671)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: oxc-guard[bot] <276638029+oxc-guard[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Applies byte-plane (stream-split) compression to the region
(browser, version)pair-index blob — the single largest bundled data blob.Each region's pair indices are
u16values, but there are only ~557 distinct pairs, so the high byte is almost always0. Instead of postcard-varint-encoding interleavedu16s per region, the codegen now writes all the low bytes, then all the high bytes, then deflates. The high-byte plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did.The reader splits the decompressed blob at
len / 2and recombineslo | hi << 8— no postcard deserialization for this blob anymore.PAIR_RANGESswitches from byte offsets to element offsets (cumulative datum counts).Results (lossless)
The binary shrinks slightly more than the blob because the
u16postcard decode path is dropped. macOS file size is unchanged (16 KB page quantization swallows the sub-page win), but the.rodatais 3 KB smaller — which helps consumers that link this crate into a larger binary.Why this is the remaining clean win
An entropy analysis across ~15 candidate encodings showed:
Beating the current state would require an order-1+ model (BWT / range coder / brotli) = real decoder code that a few KB of savings won't pay for.
Verification: all 392 tests + 14 JS-fuzz proptests pass; clippy and fmt clean; every other generated blob is byte-identical (reproducible codegen).
🤖 Generated with Claude Code