Conversation
|
|
||
| debug!("codegen_block({:?}={:?})", bb, data); | ||
|
|
||
| let llbb = bx.llbb(); |
There was a problem hiding this comment.
shall we move that closer to where it is used?
There was a problem hiding this comment.
We can't. I put this there because I'm worried it will be overwritten by any of the statements. That being said, it seems way back we also never passed the block in (and instead used llbb() in add_yk_block_label, which just gets the current block the builder is pointing to). Since we never had any problems with this, it might mean that the statements don't overwrite this. But it still makes me nervous that we can't be sure.
There was a problem hiding this comment.
Good point. Shall we add a comment?
| // bx.add_yk_block_label(lbl_name); | ||
| //} | ||
| if bx.tcx().sess.opts.cg.tracer.sir_labels() && | ||
| !bx.tcx().def_path_str(self.instance.def_id()).contains("drop_in_place") && |
There was a problem hiding this comment.
I think these should be exact equality checks so as to not skip user functions that may contain "drop_in_place".
IIRC, the two offenders are core::drop_in_place and std::drop_in_place, but might be wrong.
| !bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") { | ||
| use ykpack::BLOCK_LABEL_PREFIX; | ||
| let lbl_name = CString::new(format!( | ||
| "NEW_{}:{}:{}", |
There was a problem hiding this comment.
Oops, I should have killed the NEW_ prefix. Let's kill it now.
There was a problem hiding this comment.
Didn't we have a constant defined someplace for this in ykpack?
There was a problem hiding this comment.
Ah yes. We use it! Sorry.
So just kill NEW_
|
Just a few comments! |
|
Addressed your comments. Although there are other |
|
LGTM. You can squash if you like, but we will probably rebase the target branch later anyway. Up to you. |
|
Squashed. |
|
Actually, if you want to wait a minute I can add another commit that adds the function return labels too. |
|
ok |
|
Is this ready for re-review? |
|
I've just added the labels for function call returns. I've also deduplicated some code and moved it into |
|
Yes, this is ready for re-review. |
| bx.cx().tcx().symbol_name(fx.instance).name.as_str() != "main" && | ||
| !bx.tcx().crate_name(LOCAL_CRATE).as_str().starts_with("rustc") { | ||
| let llbb = bx.llbb(); | ||
| use ykpack::BLOCK_LABEL_PREFIX; |
There was a problem hiding this comment.
Do we need another prefix for this kind of label?
There was a problem hiding this comment.
I guess we could but it's not strictly necessary.
There was a problem hiding this comment.
If a block contains a call to a function with a return value, won't we insert two labels of the same name at different places?
I may be wrong, I don't 100% remember the details.
There was a problem hiding this comment.
My gut feeling says no, but I'm not sure. Let's rename it just in case.
There was a problem hiding this comment.
Ah damn. Because I moved the label generation into BuilderCx I now need to add another argument to the add_yk_block_label function.
You meant the |
|
Sorry, yes, I meant the |
|
Moved the label generation code to after |
|
bors r+ |
|
Let's merge this manually, as bors will test this when my branch is merged. |
* Fix `const-display.rs` XPATH queries * Add `issue_76501.rs` test file * Rename issue_76501.rs to issue-76501.rs
Optimise align_offset for stride=1 further
`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:
%address = ptrtoint i8* %p to i64
%negptr = sub i64 0, %address
%offset = and i64 %negptr, %a_minus_one
And produces pretty much ideal code-gen when this function is used in
isolation.
Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.
%aligned = getelementptr inbounds i8, i8* %p, i64 %offset
This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:
; x86_64
dec rsi
mov rax, rdi
neg rax
and rax, rsi
add rax, rdi
In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:
; x86_64
lea rax, [rsi + rdi - 1]
neg rsi
and rax, rsi
To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.
One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:
; using add enables `lea` and similar CISCisms
%offset_ptr = add i64 %address, %a_minus_one
%mask = sub i64 0, %a
%masked = and i64 %offset_ptr, %mask
; can be folded with `gepi` that may follow
%offset = sub i64 %masked, %address
…and generates the intended x86_64 machine code.
One might also wonder how the increased amount of code would impact a
RISC target. Turns out not much:
; aarch64 previous ; aarch64 new
sub x8, x1, #1 add x8, x1, x0
neg x9, x0 sub x8, x8, #1
and x8, x9, x8 neg x9, x1
add x0, x0, x8 and x0, x8, x9
(and similarly for ppc, sparc, mips, riscv, etc)
The only target that seems to do worse is… wasm32.
Onto actual measurements – the best way to evaluate snipets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%!
Fixes #75579
Before: ``` 2:rustc INFO rustc_interface::passes Pre-codegen 2:rustcTy interner total ty lt ct all 2:rustc Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% 2:rustc Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% 2:rustc FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% 2:rustc Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% 2:rustc Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc total 1326 0.2% 0.1% 0.0% 0.0% 2:rustcInternalSubsts interner: #437 2:rustcRegion interner: #355 2:rustcStability interner: #1 2:rustcConst Stability interner: #0 2:rustcAllocation interner: #0 2:rustcLayout interner: #0 ``` After: ``` INFO rustc_interface::passes Post-codegen Ty interner total ty lt ct all Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% total 1326 0.2% 0.1% 0.0% 0.0% InternalSubsts interner: #437 Region interner: #355 Stability interner: #1 Const Stability interner: #0 Allocation interner: #0 Layout interner: #0 ```
Don't print thread ids and names in `tracing` logs Before: ``` 2:rustc INFO rustc_interface::passes Pre-codegen 2:rustcTy interner total ty lt ct all 2:rustc Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% 2:rustc Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% 2:rustc RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% 2:rustc FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% 2:rustc Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% 2:rustc Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% 2:rustc Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% 2:rustc total 1326 0.2% 0.1% 0.0% 0.0% 2:rustcInternalSubsts interner: #437 2:rustcRegion interner: #355 2:rustcStability interner: #1 2:rustcConst Stability interner: #0 2:rustcAllocation interner: #0 2:rustcLayout interner: #0 ``` After: ``` INFO rustc_interface::passes Post-codegen Ty interner total ty lt ct all Adt : 1078 81.3%, 0.0% 0.0% 0.0% 0.0% Array : 1 0.1%, 0.0% 0.0% 0.0% 0.0% Slice : 1 0.1%, 0.0% 0.0% 0.0% 0.0% RawPtr : 2 0.2%, 0.0% 0.0% 0.0% 0.0% Ref : 4 0.3%, 0.1% 0.1% 0.0% 0.0% FnDef : 0 0.0%, 0.0% 0.0% 0.0% 0.0% FnPtr : 76 5.7%, 0.0% 0.0% 0.0% 0.0% Placeholder : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Generator : 0 0.0%, 0.0% 0.0% 0.0% 0.0% GeneratorWitness : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Dynamic : 3 0.2%, 0.0% 0.0% 0.0% 0.0% Closure : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Tuple : 13 1.0%, 0.0% 0.0% 0.0% 0.0% Bound : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Param : 146 11.0%, 0.0% 0.0% 0.0% 0.0% Infer : 2 0.2%, 0.1% 0.0% 0.0% 0.0% Projection : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Opaque : 0 0.0%, 0.0% 0.0% 0.0% 0.0% Foreign : 0 0.0%, 0.0% 0.0% 0.0% 0.0% total 1326 0.2% 0.1% 0.0% 0.0% InternalSubsts interner: #437 Region interner: #355 Stability interner: #1 Const Stability interner: #0 Allocation interner: #0 Layout interner: #0 ``` Closes rust-lang/rust#78931 r? ``@oli-obk``
``` Benchmark #1: ./raytracer_cg_clif_pre Time (mean ± σ): 9.553 s ± 0.129 s [User: 9.543 s, System: 0.008 s] Range (min … max): 9.438 s … 9.837 s 10 runs Benchmark softdevteam#2: ./raytracer_cg_clif_post Time (mean ± σ): 9.463 s ± 0.055 s [User: 9.452 s, System: 0.008 s] Range (min … max): 9.387 s … 9.518 s 10 runs Summary './raytracer_cg_clif_post' ran 1.01 ± 0.01 times faster than './raytracer_cg_clif_pre' ```
Don't run `resolve_vars_if_possible` in `normalize_erasing_regions` Neither `@eddyb` nor I could figure out what this was for. I changed it to `assert_eq!(normalized_value, infcx.resolve_vars_if_possible(&normalized_value));` and it passed the UI test suite. <details><summary> Outdated, I figured out the issue - `needs_infer()` needs to come _after_ erasing the lifetimes </summary> Strangely, if I change it to `assert!(!normalized_value.needs_infer())` it panics almost immediately: ``` query stack during panic: #0 [normalize_generic_arg_after_erasing_regions] normalizing `<str::IsWhitespace as str::pattern::Pattern>::Searcher` #1 [needs_drop_raw] computing whether `str::iter::Split<str::IsWhitespace>` needs drop softdevteam#2 [mir_built] building MIR for `str::<impl str>::split_whitespace` softdevteam#3 [unsafety_check_result] unsafety-checking `str::<impl str>::split_whitespace` softdevteam#4 [mir_const] processing MIR for `str::<impl str>::split_whitespace` softdevteam#5 [mir_promoted] processing `str::<impl str>::split_whitespace` softdevteam#6 [mir_borrowck] borrow-checking `str::<impl str>::split_whitespace` softdevteam#7 [analysis] running analysis passes on this crate end of query stack ``` I'm not entirely sure what's going on - maybe the two disagree? </details> For context, this came up while reviewing rust-lang/rust#77467 (cc `@lcnr).` Possibly this needs a crater run? r? `@nikomatsakis` cc `@matthewjasper`
There we go. Let's get this merged into your fork and then we can raise a proper PR against softdevteam from there.