Conversation
Member
Author
|
In oxc, compile time is 7.5s + 1.5s build.rs before, 8.2s after. |
Boshen
added a commit
that referenced
this pull request
May 29, 2026
…pair indices (#690) ## Summary Applies **byte-plane (stream-split) compression** to the region `(browser, version)` pair-index blob — the single largest bundled data blob. Each region's pair indices are `u16` values, but there are only ~557 distinct pairs, so the high byte is almost always `0`. Instead of postcard-varint-encoding interleaved `u16`s per region, the codegen now writes **all the low bytes, then all the high bytes**, then deflates. The high-byte plane collapses to near nothing, and isolating it from the high-entropy low byte lets deflate model each stream far better than the interleaved varint stream did. The reader splits the decompressed blob at `len / 2` and recombines `lo | hi << 8` — no postcard deserialization for this blob anymore. `PAIR_RANGES` switches from byte offsets to element offsets (cumulative datum counts). ## Results (lossless) | | Before | After | Δ | |---|--:|--:|--:| | pair-index blob | 47,602 | 44,567 | **−3,035** | | Linux musl example binary | 782,048 | 778,528 | **−3,520** | The binary shrinks slightly more than the blob because the `u16` postcard decode path is dropped. macOS file size is unchanged (16 KB page quantization swallows the sub-page win), but the `.rodata` is 3 KB smaller — which helps consumers that link this crate into a larger binary. ## Why this is the remaining clean win An entropy analysis across ~15 candidate encodings showed: - The **percentages** blob (#2) already sits at its order-0 entropy floor — delta/raw/byte-plane/columnar/first-value-split all fail to beat the current delta-varint. - The pair-index byte-plane (44,567) is already *below* the order-0 symbol entropy floor (48,944) because deflate exploits cross-region repetition, so an order-0 arithmetic coder would be worse. - MTF, delta+zigzag, and columnar transpose all *hurt*. Beating the current state would require an order-1+ model (BWT / range coder / brotli) = real decoder code that a few KB of savings won't pay for. Verification: all 392 tests + 14 JS-fuzz proptests pass; clippy and fmt clean; every other generated blob is byte-identical (reproducible codegen). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.