Arena: use specialization to avoid copying data#78569
Conversation
|
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
|
cc @est31 |
|
Apologies for taking so long; this slipped under my radar. I'm not familiar with this code, so I'm assigning to @Mark-Simulacrum based on |
599683e to
43b83b3
Compare
|
@Mark-Simulacrum I've done what I could. I've addressed your comments and reverted some changes based on those comments. For example, changes to |
|
@bors try @rust-timer queue This should just be a win (or roughly neutral), but let's check. r=me otherwise. |
|
Awaiting bors try build completion |
|
⌛ Trying commit 43b83b334130ef066d8518ca2a125779399ccd73 with merge aa21b48c4d75183274998a446bae14e98053637a... |
|
☀️ Try build successful - checks-actions |
|
Queued aa21b48c4d75183274998a446bae14e98053637a with parent c9c57fa, future comparison URL. |
|
Finished benchmarking try commit (aa21b48c4d75183274998a446bae14e98053637a): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
Interesting. I wonder if I see the results correctly - an increase in instruction counts, but in the same time, looking at the detailed results, a decrease in times? For example, this report belongs to a 0.5% instruction count regression, but shows a -0.3% total time diff: https://perf.rust-lang.org/detailed-query.html?commit=aa21b48c4d75183274998a446bae14e98053637a&base_commit=c9c57fadc47c8ad986808fc0a47479f6d2043453&benchmark=match-stress-enum-check&run_name=incr-unchanged Hmm. |
|
I would say that this is perhaps a slight regression in instruction counts, but overall a win on cycles -- https://perf.rust-lang.org/compare.html?start=c9c57fadc47c8ad986808fc0a47479f6d2043453&end=aa21b48c4d75183274998a446bae14e98053637a&stat=cycles%3Au on non-incremental benchmarks. I am inclined to land it based on these performance results; it does not seem like an obvious win yet, but it does seem like something we should be doing to try and avoid potential problems for the cases optimized here. |
|
Ok, r=me with commits squashed |
|
@bors r+ |
|
📌 Commit e93a463 has been approved by |
|
Are the cycle numbers always this noisy? There is a 5% on incr-unchanged on the encoding-check benchmark. Apparently most of the slowdown comes from Should this worry us? |
|
Note that the 50% there is a wall clock measurement of 6ms, which is almost certainly noise. Cycle measurements frequently have noise in the 5% range on clean benchmarks; for incremental unchanged benchmarks they can be even noisier since there's less overall cycles on those. I am not particularly worried. |
|
@Mark-Simulacrum thanks for the info! |
|
☀️ Test successful - checks-actions |
Tested on commit rust-lang/rust@432d116. Direct link to PR: <rust-lang/rust#78569> 💔 rls on linux: test-pass → test-fail (cc @Xanewok).
| mem::forget(self); | ||
| slice::from_raw_parts_mut(start_ptr, len) | ||
| } | ||
| } |
There was a problem hiding this comment.
Looks like there are quite a bit of code duplication here, shouldn't it be refactored?
There was a problem hiding this comment.
Well, it's not intentionally duplicated, but I couldn't find a way to make it significantly nicer. Unfortunately, we can't use alloc_from_slice because it requires the items to be Copy.
Cleanup for rust-lang#78569
It was added in rust-lang#78569. It's complicated and doesn't actually help performance.
It was added in rust-lang#78569. It's complicated and doesn't actually help performance. Also, add a comment explaining why the two `alloc_from_iter` functions are so different.
…llot Remove the `TypedArena::alloc_from_iter` specialization. It was added in rust-lang#78569. It's complicated and doesn't actually help performance. r? `@cjgillot`
In several cases, a
VecorSmallVecis passed toArena::alloc_from_iterdirectly. This PR makes sure those cases don't copy their data unnecessarily, by specializing thealloc_from_iterimplementation.