Skip to content

Unicode data: reduce size of to_lower/to_upper tables#152954

Open
Kmeakin wants to merge 3 commits intorust-lang:mainfrom
Kmeakin:km/unicode-data/case-mapping
Open

Unicode data: reduce size of to_lower/to_upper tables#152954
Kmeakin wants to merge 3 commits intorust-lang:mainfrom
Kmeakin:km/unicode-data/case-mapping

Conversation

@Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Feb 22, 2026

Reduces the combined size of to_lower and to_upper from 25,364 bytes to 3,110 bytes. Explained in detail in the doc comments

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 22, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: bootstrap
  • bootstrap expanded to 6 candidates
  • Random selection from Mark-Simulacrum, clubby789, jieyouxu

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to review the actual implementation as well, but would like more context on the algorithm selection before doing so.

View changes since this review

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2026
@rustbot
Copy link
Collaborator

rustbot commented Feb 23, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@Kobzol
Copy link
Member

Kobzol commented Feb 23, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2026
@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Feb 23, 2026
Unicode data: reduce size of to_lower/to_upper tables
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 23, 2026

☀️ Try build successful (CI)
Build commit: dcce5a6 (dcce5a661ada2ae2006af256bf8e3b0d882272d7, parent: eeb94be79adc9df7a09ad0b2421f16e60e6d932c)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (dcce5a6): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary -4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.9% [-5.3%, -4.5%] 2
All ❌✅ (primary) - - 0

Cycles

Results (secondary -4.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.9% [-4.9%, -4.9%] 1
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.3%, -0.2%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.2% [-0.3%, -0.2%] 4

Bootstrap: 481.498s -> 480.969s (-0.11%)
Artifact size: 395.88 MiB -> 397.86 MiB (0.50%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2026
@Kmeakin Kmeakin force-pushed the km/unicode-data/case-mapping branch 2 times, most recently from 51870c1 to abe23a5 Compare March 4, 2026 00:26
@rustbot

This comment has been minimized.

@Kmeakin Kmeakin force-pushed the km/unicode-data/case-mapping branch 2 times, most recently from 23fdbc4 to 3f4a072 Compare March 5, 2026 01:13
@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 5, 2026

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 5, 2026
@Kmeakin Kmeakin requested a review from Mark-Simulacrum March 6, 2026 13:23
@Kmeakin Kmeakin force-pushed the km/unicode-data/case-mapping branch from 3f4a072 to eb10a54 Compare March 6, 2026 21:33
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can squash the commits at least a little? r=me with that and the nit fixed.

View changes since this review

Instead of generating a standalone executable to test `unicode_data`,
generate normal tests in `coretests`. This ensures tests are always
generated, and will be run as part of the normal testsuite.

Also change the generated tests to loop over lookup tables, rather than
generating a separate `assert_eq!()` statement for every codepoint. The
old approach produced a massive (20,000 lines plus) file which took
minutes to compile!
Kmeakin added 2 commits March 8, 2026 22:38
Add a doc-comment to the top of `case-mapping.rs`.
@Kmeakin Kmeakin force-pushed the km/unicode-data/case-mapping branch from eb10a54 to 902199b Compare March 8, 2026 22:40
@rustbot
Copy link
Collaborator

rustbot commented Mar 8, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 8, 2026

r=@Mark-Simulacrum

@ognevny
Copy link
Contributor

ognevny commented Mar 9, 2026

r=@Mark-Simulacrum

that should be @bors r=Mark-Simulacrum :)

@Kmeakin
Copy link
Contributor Author

Kmeakin commented Mar 9, 2026

@bors r=Mark-Simulacrum

@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 9, 2026

@Kmeakin: 🔑 Insufficient privileges: not in review users

@Mark-Simulacrum
Copy link
Member

@bors r+

@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 10, 2026

📌 Commit 902199b has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 10, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 10, 2026

⌛ Testing commit 902199b with merge 3bc6ea5...

Workflow: https://github.com/rust-lang/rust/actions/runs/22889887298

rust-bors bot pushed a commit that referenced this pull request Mar 10, 2026
…Simulacrum

Unicode data: reduce size of to_lower/to_upper tables

Reduces the combined size of to_lower and to_upper from 25,364 bytes to 3,110 bytes. Explained in detail in the doc comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants