-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
Labels
arch-x64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue
Milestone
Description
From the binarytrees performance benchmark, initial call to bottomUpTree from Bench (other calls to this method have similar issues)
;;; Windows (return via hidden byref)
488D4C2428 lea rcx, bword ptr [rsp+28H] ;; address of return byref
448BC3 mov r8d, ebx
33D2 xor edx, edx
E863FBFFFF call TreeNode:bottomUpTree(int,int):struct
488D4C2428 lea rcx, bword ptr [rsp+28H]
E869FBFFFF call TreeNode:itemCheck():int:this
;;; Linux (return in register pair)
418BF7 mov esi, r15d
33FF xor edi, edi
E85BFBFFFF call TreeNode:bottomUpTree(int,int):struct
48894598 mov gword ptr [rbp-68H], rax ;; spill return to temp
488955A0 mov qword ptr [rbp-60H], rdx
488D7D98 lea rdi, bword ptr [rbp-68H] ;; copy temp to another temp
488B07 mov rax, gword ptr [rdi]
488945B0 mov gword ptr [rbp-50H], rax
8B7F08 mov edi, dword ptr [rdi+8]
897DB8 mov dword ptr [rbp-48H], edi
488D7DB0 lea rdi, bword ptr [rbp-50H] ;; pass 2nd temp to itemCheck
E849FBFFFF call TreeNode:itemCheck():int:thisbottomUpTree has similar issues at its recursive call sites, and also does some redundant zeroing of temp structs that were zeroed in the prolog:
;; prolog: zero from rbp-28H to rbp-88H
488DBD78FFFFFF lea rdi, [rbp-88H]
B918000000 mov ecx, 24
33C0 xor rax, rax
F3AB rep stosd
;; later: re-zero part of the range
488D7DB8 lea rdi, bword ptr [rbp-48H]
G_M53682_IG03:
660F57C0 xorpd xmm0, xmm0
F30F7F07 movdqu qword ptr [rdi], xmm0
;; later: re-zero another part, overwrite it (partially with a zero),
;; then immediately read & return the values just written as a pair
488D45C8 lea rax, bword ptr [rbp-38H]
G_M53682_IG09:
660F57C0 xorpd xmm0, xmm0
F30F7F00 movdqu qword ptr [rax], xmm0
G_M53682_IG10:
895DD0 mov dword ptr [rbp-30H], ebx
33C0 xor rax, rax
488945C8 mov gword ptr [rbp-38H], rax
488B45C8 mov rax, gword ptr [rbp-38H]
488B55D0 mov rdx, qword ptr [rbp-30H]
G_M53682_IG11:
488D65D8 lea rsp, [rbp-28H]
5B pop rbx
415C pop r12
415D pop r13
415E pop r14
415F pop r15
5D pop rbp
C3 ret Note this latter bit of code could simply be something like
movsx rdx, ebx
lea rsp, ...
...
retcategory:cq
theme:structs
skill-level:expert
cost:large
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
arch-x64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additionsProduct code improvement that does NOT require public API changes/additionsoptimizationtenet-performancePerformance related issuePerformance related issue