Align rustc_metadata Table to word boundary#76620
Align rustc_metadata Table to word boundary#76620ishitatsuyuki wants to merge 1 commit intorust-lang:masterfrom
Conversation
|
(rust_highfive has picked a reviewer for you, use r? to override) |
|
I'm not familiar with metadata encoding, so r? @Mark-Simulacrum (or someone who can verify the following assumption)
I believe recent Intel processors have a hardware counter for unaligned loads, although you'll need to pass an additional flag to |
|
I guess we should at least gather perf stats, though that's on a Ryzen 3600. (Instructions should basically not change, but wall-times will still be interesting). I think probably @eddyb or maybe @petrochenkov know the metadata system best right now. @bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit e647c6a with merge bf8d56e992aa4e07bacb0236d07c67f1a921bbdb... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued bf8d56e992aa4e07bacb0236d07c67f1a921bbdb with parent 7adeb2c, future comparison URL. |
|
Finished benchmarking try commit (bf8d56e992aa4e07bacb0236d07c67f1a921bbdb): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
Looking at wall times, perf looks mixed, and cpu clock just shows regressions, so I feel like this is probably not really an improvement (or at least not always one), and given that it's pretty finicky code wise I'm inclined to close. |
|
I agree that the change is hard to justify without meaningful performance improvement, so closing this. |
This can save a few cycles when converting the bytes to u32. That said, probably very marginal benefits. The motivation is more like, borrowing ideas from FlatBuffers/Cap'n Proto.
I wonder if inserting arbitrary paddings (without communicating them to the decoder) between lazy items are OK? From my understanding it's fine, since lazy items are only referenced by relative/absolute position and does not have to be in a packed layout.
Also note for perf: this will not reduce the instruction count, which is the default metric, since the decoder part uses the same (machine) code.