Value-number byref loads#9169
Conversation
|
@briansull @dotnet/jit-contrib PTAL. I expect it would be easiest to review the constituent commits individually. I've confirmed this gets the desired loop hoisting from #9036, and the desired CSEs from #7903, and recognizes the motivating CSE from #7914 as legal, though CSE's heuristics decide not to fire in that case (though many similar methods improve as expected). Diffs generally consist of improved redundant load elimination, invariant code motion, and bounds-check elimination. Desktop SPMI diffs report a codesize win of 2% across 35,000 affected methods. Lab runs appear to show a 7% improvement in the Matinv4 benchmark, in which a number of redundant loads and bounds checks have been removed. |
|
@dotnet-bot test Windows_NT perf |
8ea192d to
5fea06a
Compare
| inline MemoryKindSet memoryKindSet(MemoryKind id, MemoryKinds... ids) | ||
| { | ||
| return (1U << id) | memoryKindSet(ids...); | ||
| } |
There was a problem hiding this comment.
I'm not familiar with this syntax:
typename... MemoryKinds... and ids...
What is this called?
And is it supported across our compilers?
src/jit/block.h
Outdated
| // Bitmask for an empty MemoryKindSet | ||
| inline MemoryKindSet memoryKindSet() | ||
| { | ||
| return 0; |
There was a problem hiding this comment.
How about just using two consts: emptyMemoryKindSet' and fullMemoryKindSet`
There was a problem hiding this comment.
Updated, thanks for the suggestion; having emptyMemoryKindSet instead of memoryKindSet() at those uses reads much cleaner I think.
src/jit/codegenlegacy.cpp
Outdated
| if ((tree->gtFlags & GTF_IND_VOLATILE) != 0) | ||
| { | ||
| // For any Volatile indirection, we must handle it as a | ||
| // definition of the global heap |
There was a problem hiding this comment.
Fixed (and likewise for the rest of these)
| GcHeap, // Includes actual GC heap, and also static fields | ||
| MemoryKindCount, // Number of MemoryKinds | ||
| }; | ||
| #ifdef DEBUG |
There was a problem hiding this comment.
You might want to mention that we really only support two kinds of memory and that if a third kind is every added then you need to examine the use of the variable byrefStatesMatchGcHeapStates.
There was a problem hiding this comment.
I think I avoided baking that assumption into the places that use byrefStatesMatchGcHeapStates. I used a few patterns:
1 - places where output[GcHeap] and output[ByrefExposed] reference the same data structure:
for (MemoryKind memoryKind : allMemoryKinds()) {
if (memoryKind == GcHeap && statesMatch) {
// already updated
assert(memoryKind > ByrefExposed);
assert(input[memoryKind] == input[ByrefExposed]); // and/or assert(output[memoryKind] == f(input[memoryKind]))
continue;
}
output[memoryKind] = f(input[memoryKind]);
}2a - places where output[GcHeap] and output[ByrefExposed] need to be copies of each other:
for (MemoryKind memoryKind : allMemoryKinds()) {
if (memoryKind == GcHeap && statesMatch) {
// copy result from ByrefExposed
assert(memoryKind > ByrefExposed);
output[memoryKind] = output[ByrefExposed];
continue;
}
output[memoryKind] = f(input[memoryKind]);
}2b - other places where output[GcHeap] and output[ByrefExposed] need to be copies of each other:
for (MemoryKind memoryKind : allMemoryKinds()) {
if (memoryKind == GcHeap && statesMatch) {
// already updated
assert(memoryKind > ByrefExposed);
assert(input[memoryKind] == input[ByrefExposed]); // and/or assert(output[memoryKind] == f(input[memoryKind]))
continue;
}
output[memoryKind] = f(input[memoryKind]);
if (memoryKind == GcHeap && statesMatch) {
// copy over to ByrefExposed
output[ByrefExposed] = output[memoryKind];
}
}(2a was more convenient for some of the particular updates, 2b for others).
In all cases, if a third kind were added the loop would just process it independently of GcHeap and ByrefExposed, which I think is a good default position.
Does that address your concern, or am I misunderstanding the feedback?
src/jit/compiler.cpp
Outdated
| { | ||
| m_memorySsaMap[memoryKind] = nullptr; | ||
| } | ||
| m_refAnyClass = nullptr; |
There was a problem hiding this comment.
I would be willing to reorder m_refAnyClass to come before the loop
src/jit/optimizer.cpp
Outdated
|
|
||
| bool heapHavoc = false; // True ==> there's a call or a memory store that has arbitrary heap effects. | ||
| // MemoryKinds for which an in-loop call or store has arbitrary effects. | ||
| MemoryKindSet memoryHavoc = memoryKindSet(); |
There was a problem hiding this comment.
This would read better as
memoryHavoc = emptyMemoryKindSet();
|
|
||
| // Bitmask containing all the MemoryKinds | ||
| const MemoryKindSet fullMemoryKindSet = (1 << MemoryKindCount) - 1; | ||
|
|
src/jit/liveness.cpp
Outdated
|
|
||
| case GT_CLS_VAR: | ||
| // For Volatile indirection, first mutate the global heap | ||
| // For Volatile indirection, first mutate the GC heap |
src/jit/liveness.cpp
Outdated
|
|
||
| case GT_IND: | ||
| // For Volatile indirection, first mutate the global heap | ||
| // For Volatile indirection, first mutate GcHeap |
| } | ||
|
|
||
| // In minopts, all stores conservatively affect both GcHeap and ByrefExposed memory | ||
| byrefStatesMatchGcHeapStates = true; |
There was a problem hiding this comment.
No, perhaps the comment was misleading. I don't expect byrefStatesMatchGcHeapStates to be checked in minOpts, but leaving it unset didn't seem right. I've updated the comment to
// In minopts, we don't explicitly build SSA or value-number; GcHeap and
// ByrefExposed implicitly (conservatively) change state at each instr.Is that better?
| if ((fgCurMemoryHavoc & memoryKindSet(memoryKind)) != 0) | ||
| { | ||
| printf("*"); | ||
| } |
There was a problem hiding this comment.
You could reduce what we print in the dumps by adding:
if (byrefStatesMatchGcHeapStates) { break; }
There was a problem hiding this comment.
I actually had it like that at first, but regretted it as soon as I started debugging this stuff. Perhaps if that meant that ByrefExposed (or GcHeap) were just never mentioned in the dumps of byrefStatesMatchGcHeapStates methods, it would be useful, but since some of the liveness code that runs before statesMatch has been computed will end up dumping both, it just turns out confusing; I think better a more understandable dump than a smaller one, if we have to choose (and it seems in this case we have to choose).
| if ((fgCurMemoryUse & memoryKindSet(memoryKind)) != 0) | ||
| { | ||
| printf(" + %s", memoryKindNames[memoryKind]); | ||
| } |
There was a problem hiding this comment.
You could reduce what we print in the dumps by adding:
if (byrefStatesMatchGcHeapStates) { break; }
src/jit/liveness.cpp
Outdated
| m_heapLiveIn = false; | ||
| m_heapLiveOut = false; | ||
| m_memoryLiveIn = memoryKindSet(); | ||
| m_memoryLiveOut = memoryKindSet(); |
| if ((block->bbMemoryLiveIn & memoryKindSet(memoryKind)) != 0) | ||
| { | ||
| printf(" + %s", memoryKindNames[memoryKind]); | ||
| } |
There was a problem hiding this comment.
if (byrefStatesMatchGcHeapStates) { break; }
| if ((block->bbMemoryLiveOut & memoryKindSet(memoryKind)) != 0) | ||
| { | ||
| printf(" + %s", memoryKindNames[memoryKind]); | ||
| } |
There was a problem hiding this comment.
if (byrefStatesMatchGcHeapStates) { break; }
src/jit/optimizer.cpp
Outdated
| } | ||
|
|
||
| if (heapHavoc) | ||
| if (memoryHavoc != 0) |
There was a problem hiding this comment.
Can memoryHavoc actually be different for GcHeap vs ByrefExposed?
If so it would be useful to have a test case in this area
There was a problem hiding this comment.
Since we assume all object field stores, static field stores, and array element stores alias all byref loads, we treat them as ByrefExposed-havoc (but not GcHeap-havoc). So every loop that has these but not GcHeap-havoc will have one kind of havoc but not the other.
src/jit/valuenum.cpp
Outdated
| // | ||
| // Return Value: | ||
| // The value number corresopnding to the heap after the assignment. | ||
| // The value number corresopnding to memory after the assignment. |
| assert(compCurBB->bbHeapDef); | ||
|
|
||
| fgCurHeapVN = vnStore->VNForMapStore(TYP_REF, fgCurHeapVN, elemTypeEqVN, newValAtArrType); | ||
| return vnStore->VNForMapStore(TYP_REF, fgCurMemoryVN[GcHeap], elemTypeEqVN, newValAtArrType); |
There was a problem hiding this comment.
Should we try to keep the old assert?
assert(compCurBB->bbHeapDef);
There was a problem hiding this comment.
This code used to assign fgCurHeapVN but doesn't anymore. Now it just returns the new value number, which then gets passed to recordGcHeapStore to do the update to fgCurMemoryVN[GcHeap], and that function does have the updated assert.
src/jit/valuenum.cpp
Outdated
| assert(fgCurHeapVN != ValueNumStore::NoVN); | ||
| } | ||
| else | ||
| // Now do the same for memory. |
There was a problem hiding this comment.
Now do the same for both kinds of memory.
| } | ||
| else | ||
| // Now do the same for memory. | ||
| for (MemoryKind memoryKind : allMemoryKinds()) |
There was a problem hiding this comment.
Do we have to twice the work here, or should we check the bool byrefStatesMatchGcHeapStates
5fea06a to
ba5c144
Compare
src/jit/valuenum.cpp
Outdated
| #ifdef DEBUG | ||
| // Sometimes we query the heap ssa map, and need a dummy location for the ignored result. | ||
| unsigned heapSsaNum; | ||
| // Sometimes we query the memory ssa map, and need a dummy location for the ignored result. |
There was a problem hiding this comment.
Sometimes we query the memory ssa map in an assert
src/jit/valuenum.cpp
Outdated
| } | ||
|
|
||
| // We just mutate the heap if isVolatile is true, and then do the read as normal. | ||
| // We just mutate GcHeap if isVolatile is true, and then do the read as normal. |
src/jit/valuenum.cpp
Outdated
| } | ||
| else if (lvaVarAddrExposed(lcl->gtLclNum)) | ||
| { | ||
| // We could use MapStore here and MapSelect on reads of untracked locals |
There was a problem hiding this comment.
Should this say of address exposed locals rather than untracked locals
src/jit/valuenum.cpp
Outdated
| // This side-effects ByrefExposed. Just use a new opaque VN. | ||
| // As with GT_LCL_VAR, we could probably use MapStore here and MapSelect at corresponding | ||
| // loads, but to do so would have to identify the subset of locals which aren't tracked | ||
| // but whose fields can be disambiguated. |
There was a problem hiding this comment.
should this say aren't address exposed rather than aren't tracked
src/jit/valuenum.cpp
Outdated
| else if (lvaVarAddrExposed(lclNum)) | ||
| { | ||
| // Need to record the effect on ByrefExposed. | ||
| // We could use MapStore here and MapSelect on reads of untracked locals |
src/jit/valuenum.cpp
Outdated
| } | ||
|
|
||
| // We model statics as indices into the heap variable. | ||
| // We model statics as indices into the GC heap variable. |
src/jit/valuenum.h
Outdated
| #ifdef DEBUG | ||
| // This helps test some performance pathologies related to "evaluation" of VNF_MapSelect terms, | ||
| // especially relating to the heap. We count the number of applications of such terms we consider, | ||
| // especially relating to the GcHeap. We count the number of applications of such terms we consider, |
| GcHeap, // Includes actual GC heap, and also static fields | ||
| MemoryKindCount, // Number of MemoryKinds | ||
| }; | ||
| #ifdef DEBUG |
fd32e72 to
908dc2f
Compare
Consolodate the code for updating the current heap VN and HeapSsaMap entries into new helper method `recordHeapStore`, to make it easier to evolve that code. Change `fgValueNumberArrIndexAssign` to return the new heap VN to allow callers to pass it down into `recordHeapStore` accordingly. This is just a refactoring; no change to compiler behavior.
Re-cast the notion of "heap" (in liveness, SSA, and value-numbering) as one of potentially many `MemoryKind`s, called `GcHeap`. Update names, comments, data structures, and signatures as appropriate to parameterize relevant data/methods over `MemoryKind`. This change is a no-diff refactoring, and currently `GcHeap` is the only `MemoryKind`. Generally, codepaths which will generically need to process all `MemoryKinds`s (initializing, dumping, dataflow propagation) now iterate over all `MemoryKinds`, and codepaths which are sensitive to the semantics of the specific `MemoryKind` (def/use identification in liveness and value numbering) are changed to specifically operate on `MemoryKind::GcHeap`. One notable exception is that `lvMemoryPerSsaData` and `CountForMemoryDef` are *not* parameterized over `MemoryKind`; there's a single "space" of SSA defnums for memory defs (though the same tree can incur different defs for different memory kinds [in which case their defnums will differ]), to facilitate subsequently sharing SSA nodes across memory kinds when appropriate.
Add a new `MemoryKind` to represent "any memory that byrefs may reference", called `ByrefExposed`. All definition points of `GcHeap` become also definition points of `ByrefExposed`, and additionally, so do definitions of address-exposed locals. Because it is a common case (currently happening in 90% of methods in System.Private.CoreLib) that a method has no definitions of address-exposed locals, have liveness detect this condition and set the new `byrefStatesMatchGcHeapStates` flag accordingly; when the states match, the same `MemoryPhi`s, defnums, and `PerSsaData` are then shared between the two memory kinds, to avoid excess allocations/processing. This change defines `ByrefExposed` memory and its def/use points, and builds SSA for it, but no optimizations make use of it, so this is a no-diff change.
Teach value numbering to keep track of the current value number for `ByrefExposed` memory, updating it appropriately at relevant stores. Introduce a new VN operator `ByrefExposedLoad`, used for loads of address-exposed locals and loads by indirs that don't have more specific value numbers, which is a function of the current `ByrefExposed` memory VN and the pointer VN. This allows loop hoisting and CSE to recognize redundant loads via byrefs when there are no intervening stores. Fixes #7903.
908dc2f to
a796400
Compare
Value-number byref loads Commit migrated from dotnet/coreclr@1e731c4
This change teaches value numbering to track states of "byref exposed" memory,
like it currently tracks states of heap memory; the default value number for a load
can then become a function of the memory state and pointer VN. This enables
CSE and loop hoisting to kick in when there are no intervening stores.
It also clears the way for us to chain "byref exposed" memory VNs similar to how
heap VNs are chained (via
VNF_MapStore), to enable value propagation throughbyref params and/or address-taken locals when there are no intervening conflicting
stores.
There are a few constituent changes:
"memory" tracking that allows for multiple memory kinds, of which
"heap" is one.
The first three changes are no-diff.