Skip to content

Reduce binary size of bundled caniuse/electron data#688

Merged
Boshen merged 2 commits into
mainfrom
reduce-binary-size
May 29, 2026
Merged

Reduce binary size of bundled caniuse/electron data#688
Boshen merged 2 commits into
mainfrom
reduce-binary-size

Conversation

@Boshen

@Boshen Boshen commented May 29, 2026

Copy link
Copy Markdown
Member

Reduces the footprint that this crate's bundled Can I Use / Electron / Node data adds to a binary, by re-representing the embedded tables so the existing deflate compresses them better and the generated rodata carries no string relocations. No new dependencies, no profile changes, and no loss of precision — query results are byte-identical to before (verified across the full region / feature / electron / cover query space, plus the existing test suite and JS-reference proptests).

Size

Measured with the inspect example:

Target Before After Reduction
x86_64-unknown-linux-musl 861,760 782,048 −79,712 (−9.2%)
macOS (aarch64-apple-darwin) 701,424 635,392 −66,032 (−9.4%)

The Linux figure is the more representative one: on ELF every &str in a static table costs a 24-byte .rela.dyn relocation on top of its 16-byte fat pointer, so removing those (see the last two techniques) pays off most there.

Techniques

All are lossless and operate on the generated data representation (xtask codegen) together with the matching runtime decoders:

  • Delta-encode region usage percentages. They're sorted descending within each region, so storing prev − curr yields small deltas with long runs of zeros that deflate packs tightly.
  • Intern region (browser, version) pairs. Only ~557 distinct pairs exist, yet they were stored ~47k times across regions. They now live in one global table ordered by popularity, and each region stores a single u16 pair-index per entry instead of a browser byte and a version index — dropping the redundant per-region browser array.
  • Intern feature-support version strings. The y / a support lists referenced the same version strings over and over; they're now u16 indices into a sorted version table.
  • Compress the lookup keys. The region-code and feature-name keys were inline &[&str] (a 16-byte fat pointer plus a relocation per entry); they're now deflate blobs decoded once.
  • String-pack the global-usage and Electron tables. Each &str is replaced by a u32 that bitpacks offset << 8 | len into a single concatenated string pool, eliminating the per-entry &str relocations with no decode code.

Decoding stays plain std::sync::OnceLock, with no helper types.

🤖 Generated with Claude Code

Re-represent the embedded data tables so the existing deflate compresses
them better and the generated rodata carries no string relocations. All
changes are lossless: query results are byte-identical to before (verified
across every region/feature/electron/cover query), tests and proptests pass.

- Delta-encode region usage percentages (non-increasing per region, so the
  deltas are small with long zero runs)
- Intern region (browser, version) pairs into one popularity-ordered table
  and store a u16 pair-index per datum, dropping the redundant per-region
  browser-id array (only ~557 unique pairs were stored ~47k times)
- Intern feature-support version strings via a sorted table + u16 indices
- Move region/feature lookup keys out of inline &[&str] into deflate blobs
  decoded once (no fat pointers / relocations)
- String-pack the global-usage and electron tables: a concatenated &str pool
  referenced by a u32 bitpacking offset<<8|len, eliminating the per-entry
  &str relocations

Linux (x86_64-unknown-linux-musl): 861,760 -> 784,128 bytes (-9.0%).
@codspeed-hq

codspeed-hq Bot commented May 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing reduce-binary-size (b394b04) with main (50ded7c)

Open in CodSpeed

@codecov

codecov Bot commented May 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.04%. Comparing base (50ded7c) to head (b394b04).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #688      +/-   ##
==========================================
+ Coverage   99.03%   99.04%   +0.01%     
==========================================
  Files          47       47              
  Lines        2374     2411      +37     
==========================================
+ Hits         2351     2388      +37     
  Misses         23       23              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Decode each string lookup table (the region and feature version tables and the
generated region/feature key tables) into a single OnceLock<Vec<String>>
instead of the previous two-OnceLock "decompressed bytes + borrowing Vec<&str>"
pair; .as_str() still hands the call sites the &'static str they need. Inline
include_bytes! so each compressed blob needs only one static.

Plain std::sync::OnceLock throughout, no helper types. No behavior change
(query results stay byte-identical; verified across the full region/feature/
electron/cover query space).
@Boshen Boshen force-pushed the reduce-binary-size branch from fb0e9cc to b394b04 Compare May 29, 2026 06:20
@Boshen Boshen merged commit 266518d into main May 29, 2026
16 checks passed
@Boshen Boshen deleted the reduce-binary-size branch May 29, 2026 10:04
@oxc-guard oxc-guard Bot mentioned this pull request May 29, 2026
Boshen pushed a commit that referenced this pull request May 29, 2026
## 🤖 New release

* `oxc-browserslist`: 3.0.3 -> 3.0.4 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[3.0.4](oxc-browserslist-v3.0.3...oxc-browserslist-v3.0.4)
- 2026-05-29

### Other

- DRY up feature/region codegen with shared table + lookup helpers
([#694](#694))
- Consolidate bundled-data loading behind compression helpers
([#693](#693))
- Reduce binary size: run-length encode feature support versions
([#692](#692))
- Reduce binary size: Zopfli codegen compression + percentage byte-plane
([#691](#691))
- Reduce binary size: byte-plane (stream-split) compression for region
pair indices
([#690](#690))
- Update README binary size to 621K
([#689](#689))
- Reduce binary size of bundled caniuse/electron data
([#688](#688))
- Switch codegen data source from caniuse-db to caniuse-lite
([#687](#687))
- Update browserslist
([#685](#685))
- Update rust crates
([#682](#682))
- Update browserslist
([#679](#679))
- Update browserslist
([#678](#678))
- Update browserslist
([#677](#677))
- Update browserslist
([#673](#673))
- Update browserslist
([#671](#671))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: oxc-guard[bot] <276638029+oxc-guard[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant