Minimize the cost of write_fmt without arguments#75358
Minimize the cost of write_fmt without arguments#75358tesuji wants to merge 1 commit intorust-lang:masterfrom
Conversation
|
Can I have a perf run? cc @Mark-Simulacrum |
|
@bors try @rust-timer queue |
|
Awaiting bors try build completion |
|
⌛ Trying commit 23fdb55 with merge 01463991259f5ad8ff520b94d9c9c2e72cdf1a98... |
|
☀️ Try build successful - checks-actions, checks-azure |
|
Queued 01463991259f5ad8ff520b94d9c9c2e72cdf1a98 with parent 13290e8, future comparison URL. |
|
|
|
Finished benchmarking try commit (01463991259f5ad8ff520b94d9c9c2e72cdf1a98): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
|
I don't know how to interpret this perf result. It seems that most are regressed. |
|
I cannot reproduce your assembly with the try build's artifacts; I suspect that the difference you're seeing is likely an effect of local compilation options for std allowing more aggressive inlining or so than what we can do with dist-produced artifacts. use std::fmt::{self, Write as FmtWrite};
#[no_mangle]
pub fn bar(buf: &mut String) -> fmt::Result {
write!(buf, " ")
}gets me: before this PR: ; rustc +13290e83a6e20f3b408d177a9d64d8cf98fe4615 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
51e0: 48 83 ec 38 sub rsp,0x38
51e4: 48 89 3c 24 mov QWORD PTR [rsp],rdi
51e8: 48 8d 05 69 a8 03 00 lea rax,[rip+0x3a869] # 3fa58 <__do_global_dtors_aux_fini_array_entry+0x8>
51ef: 48 89 44 24 08 mov QWORD PTR [rsp+0x8],rax
51f4: 48 c7 44 24 10 01 00 mov QWORD PTR [rsp+0x10],0x1
51fb: 00 00
51fd: 48 c7 44 24 18 00 00 mov QWORD PTR [rsp+0x18],0x0
5204: 00 00
5206: 48 8d 05 fb dd 02 00 lea rax,[rip+0x2ddfb] # 33008 <_fini+0x454>
520d: 48 89 44 24 28 mov QWORD PTR [rsp+0x28],rax
5212: 48 c7 44 24 30 00 00 mov QWORD PTR [rsp+0x30],0x0
5219: 00 00
521b: 48 8d 35 9e a8 03 00 lea rsi,[rip+0x3a89e] # 3fac0 <anon.4574bf75806fea7c4a1d662dd29266f6.0.llvm.133549432485651853>
5222: 48 89 e7 mov rdi,rsp
5225: 48 8d 54 24 08 lea rdx,[rsp+0x8]
522a: ff 15 b0 c8 03 00 call QWORD PTR [rip+0x3c8b0] # 41ae0 <_GLOBAL_OFFSET_TABLE_+0x70>
5230: 48 83 c4 38 add rsp,0x38
5234: c3 ret
with this PR: ; rustc +01463991259f5ad8ff520b94d9c9c2e72cdf1a98 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
51e0: 41 56 push r14
51e2: 53 push rbx
51e3: 50 push rax
51e4: 49 89 fe mov r14,rdi
51e7: 48 8b 77 10 mov rsi,QWORD PTR [rdi+0x10]
51eb: ba 01 00 00 00 mov edx,0x1
51f0: e8 3b ff ff ff call 5130 <alloc::raw_vec::RawVec<T,A>::reserve>
51f5: 49 8b 5e 10 mov rbx,QWORD PTR [r14+0x10]
51f9: 49 8b 3e mov rdi,QWORD PTR [r14]
51fc: 48 01 df add rdi,rbx
51ff: 48 8d 15 fa dd 02 00 lea rdx,[rip+0x2ddfa] # 33000 <_fini+0x42c>
5206: be 01 00 00 00 mov esi,0x1
520b: b9 01 00 00 00 mov ecx,0x1
5210: e8 1b 00 00 00 call 5230 <core::slice::<impl [T]>::copy_from_slice>
5215: 48 83 c3 01 add rbx,0x1
5219: 49 89 5e 10 mov QWORD PTR [r14+0x10],rbx
521d: 31 c0 xor eax,eax
521f: 48 83 c4 08 add rsp,0x8
5223: 5b pop rbx
5224: 41 5e pop r14
5226: c3 ret |
|
@Mark-Simulacrum that seems to be even more aggressively inlined, i.e, |
|
Hm, I tried "try" build and master artifact by using rustup-toolchain-install-master: rustc -C debuginfo=1 -o ./foo-stage1.s --emit asm -Cllvm-args=--x86-asm-syntax=intel --crate-type rlib --color=always -C opt-level=3 -Z symbol-mangling-version=v0 ./foo.rsThe source files are the same as in godbolt link. No no_mangle. |
| write(&mut self, args) | ||
| match args.as_str() { | ||
| Some(s) => self.write_str(s), | ||
| None => write(&mut self, args), |
There was a problem hiding this comment.
This special casing should probably be done inside write to cover more cases.
There was a problem hiding this comment.
Actually, it may already be done there.
There was a problem hiding this comment.
It seems that moving this logic to write doesn't help inline calls.
|
Hi, I created a zulip topic to discuss about this change: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358 |
|
Probably try again when inline const implemented. |
|
This change could be very beneficial for embedded/no_std projects. Not necessarily for performance, but for code size. The discussion above was only about performance, where it seems to make little difference. However, code size, especially for no_std projects, is also very important. #78122 (comment) contains a simple benchmark program that could be used to check the difference this PR makes. |
|
Another data point: There are quite a few macros in the wild that use a special-case to avoid calling This change would make the special case there identical to the generic case, simplifying things. |
Thanks to
Arguments::as_str, we should knowwhen we don't have any arguments at all.
Before: https://rust.godbolt.org/z/zKGEMP
After:
Closes #75301
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358