Skip to content

Reduce binary size: byte-plane (stream-split) compression for region pair indices#690

Merged
Boshen merged 1 commit into
mainfrom
byte-plane-region-pair-indices
May 29, 2026
Merged

Reduce binary size: byte-plane (stream-split) compression for region pair indices#690
Boshen merged 1 commit into
mainfrom
byte-plane-region-pair-indices

Conversation

@Boshen

@Boshen Boshen commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

Applies byte-plane (stream-split) compression to the region (browser, version) pair-index blob — the single largest bundled data blob.

Each region's pair indices are u16 values, but there are only ~557 distinct pairs, so the high byte is almost always 0. Instead of postcard-varint-encoding interleaved u16s per region, the codegen now writes all the low bytes, then all the high bytes, then deflates. The high-byte plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did.

The reader splits the decompressed blob at len / 2 and recombines lo | hi << 8 — no postcard deserialization for this blob anymore. PAIR_RANGES switches from byte offsets to element offsets (cumulative datum counts).

Results (lossless)

Before After Δ
pair-index blob 47,602 44,567 −3,035
Linux musl example binary 782,048 778,528 −3,520

The binary shrinks slightly more than the blob because the u16 postcard decode path is dropped. macOS file size is unchanged (16 KB page quantization swallows the sub-page win), but the .rodata is 3 KB smaller — which helps consumers that link this crate into a larger binary.

Why this is the remaining clean win

An entropy analysis across ~15 candidate encodings showed:

  • The percentages blob (feat: use codegen instead of build.rs #2) already sits at its order-0 entropy floor — delta/raw/byte-plane/columnar/first-value-split all fail to beat the current delta-varint.
  • The pair-index byte-plane (44,567) is already below the order-0 symbol entropy floor (48,944) because deflate exploits cross-region repetition, so an order-0 arithmetic coder would be worse.
  • MTF, delta+zigzag, and columnar transpose all hurt.

Beating the current state would require an order-1+ model (BWT / range coder / brotli) = real decoder code that a few KB of savings won't pay for.

Verification: all 392 tests + 14 JS-fuzz proptests pass; clippy and fmt clean; every other generated blob is byte-identical (reproducible codegen).

🤖 Generated with Claude Code

@codspeed-hq

codspeed-hq Bot commented May 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing byte-plane-region-pair-indices (2f93cce) with main (266518d)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (9d315b9) during the generation of this report, so 266518d was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (266518d) to head (2f93cce).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #690   +/-   ##
=======================================
  Coverage   99.04%   99.04%           
=======================================
  Files          47       47           
  Lines        2411     2413    +2     
=======================================
+ Hits         2388     2390    +2     
  Misses         23       23           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The region (browser, version) pair-index blob is the single largest
bundled blob. It stored one u16 pair index per datum as postcard varints
concatenated per region, then deflated.

Split the indices into two byte planes instead — every low byte first,
then every high byte — before deflating. There are only ~557 distinct
pairs, so the high byte is almost always 0; that plane collapses to near
nothing, and isolating it from the high-entropy low byte lets deflate
model each stream far better than the interleaved varint stream did.

The decode path now does plain `lo | hi << 8` arithmetic (no postcard
deserialization for this blob), and PAIR_RANGES holds element offsets
(cumulative datum counts) rather than byte offsets.

- blob: 47,602 -> 44,567 bytes (-3,035)
- Linux musl example binary: 782,048 -> 778,528 bytes (-3,520)
- macOS unchanged (16 KB page quantization)

Lossless: all tests pass, including the proptests that fuzz region
coverage against the JS browserslist. Every other generated blob is
byte-identical.
@Boshen Boshen force-pushed the byte-plane-region-pair-indices branch from 104f48e to 2f93cce Compare May 29, 2026 10:47
@Boshen Boshen merged commit 82e51cd into main May 29, 2026
16 checks passed
@Boshen Boshen deleted the byte-plane-region-pair-indices branch May 29, 2026 11:01
@oxc-guard oxc-guard Bot mentioned this pull request May 29, 2026
Boshen pushed a commit that referenced this pull request May 29, 2026
## 🤖 New release

* `oxc-browserslist`: 3.0.3 -> 3.0.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[3.0.4](oxc-browserslist-v3.0.3...oxc-browserslist-v3.0.4)
- 2026-05-29

### Other

- DRY up feature/region codegen with shared table + lookup helpers
([#694](#694))
- Consolidate bundled-data loading behind compression helpers
([#693](#693))
- Reduce binary size: run-length encode feature support versions
([#692](#692))
- Reduce binary size: Zopfli codegen compression + percentage byte-plane
([#691](#691))
- Reduce binary size: byte-plane (stream-split) compression for region
pair indices
([#690](#690))
- Update README binary size to 621K
([#689](#689))
- Reduce binary size of bundled caniuse/electron data
([#688](#688))
- Switch codegen data source from caniuse-db to caniuse-lite
([#687](#687))
- Update browserslist
([#685](#685))
- Update rust crates
([#682](#682))
- Update browserslist
([#679](#679))
- Update browserslist
([#678](#678))
- Update browserslist
([#677](#677))
- Update browserslist
([#673](#673))
- Update browserslist
([#671](#671))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: oxc-guard[bot] <276638029+oxc-guard[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant