Add IR support for modeling multiple SSA definitions as one store#76139
Conversation
Store them in the SSA descriptor instead of a map. Memory (x64): +0.18% PIN (x64): -0.04% The memory regression is not small...
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsNo code diffs, small (< The benefit is we get to model multi-reg assignments and get rid of the The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though. Closes #41242.
|
2d036ac to
27dd411
Compare
|
@dotnet/jit-contrib, @AndyAyersMS, @jakobbotsch |
For now, just the "CanBeReplacedWithItsField" case. This enables some nice simplifications, even as the general case gets more complex. Two quirks have been added to attain zero diffs.
Another 0.025%.
27dd411 to
7cd3df8
Compare
|
@AndyAyersMS please review this community PR. cc @dotnet/jit-contrib. |
I am not sure I understand this -- soon enough we will want to make sure our SSA graphs are complete. |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Issues in the review notes can be handled as follow-ups.
Are we in position to relax some of this?
runtime/src/coreclr/jit/ssabuilder.cpp
Lines 1647 to 1661 in 6c949f5
| GenTreeOp* m_asg; | ||
| GenTreeOp* m_asg = nullptr; | ||
| // The SSA number associated with the previous definition for partial (GTF_USEASG) defs. | ||
| unsigned m_useDefSsaNum = SsaConfig::RESERVED_SSA_NUM; |
There was a problem hiding this comment.
Seems like this could also be handled as part of the "def" ssa num, via another variant of composite/outline encoding?
There was a problem hiding this comment.
Probably, but it'd be quite complex I think, and slower for the common case in terms of instruction counts at least.
Well, it's exactly that -- currently, it is not a problem for us to not know all uses of SSA variables. With #77055, that would of course change, which is somewhat unfortunate, because morph doesn't let us know whether dependent promotion became so because of a non-decomposed use (most often a local field), or because of a non-decomposed def, so the choice would become to either enhance morph to differentiate between the two, or do the accounting work necessary to support uses. |
To elaborate, this would involve renaming composite uses, and supporting cloning composite references (one can notice the current code doesn't handle this because it doesn't need to - SSA definitions cannot be cloned). |
Yes, removing that dependent promotion check is the primary motivation for the change. As-is, we could remove the check after this change is merged, with some light amount of related bug fixes. With the support for use accounting, it would require more expansive changes / thought. |
Probably it would be enough in the short run to know that for this local/def there were no uses that got left out of SSA, and the checker could be likewise "enhanced" to not flag errors for such locals. I expect most of the benefit to accrue from simple cases where there is just a block-local lifetime. Down the road I am interested in trying to rename some the SSA webs; I wonder if this would be a relatively cheap way to make LSRA's life easier. |
|
Let's take this as is and sort through the rest over time. |
|
Thanks again -- this is a very nice enhancement. |
sandreenko
left a comment
There was a problem hiding this comment.
it is great to see such improvement!
| static const int BITS_PER_SIMPLE_NUM = 8; | ||
| static const int MAX_SIMPLE_NUM = (1 << (BITS_PER_SIMPLE_NUM - 1)) - 1; | ||
| static const int SIMPLE_NUM_MASK = MAX_SIMPLE_NUM; | ||
| static const int SIMPLE_NUM_COUNT = sizeof(int) / BITS_PER_SIMPLE_NUM; |
There was a problem hiding this comment.
do I miss something or is it always 0? What does it mean?
There was a problem hiding this comment.
Oops, that is definitely a mistake, it should be (sizeof(int) * 8) / BITS_PER_SIMPLE_NUM, i. e. the maximum number of fields that can be encoded compactly.
Will be fixed in #77238.
| // [byte 1]: [compact encoding bit][ssa num 1] (7 bits) | ||
| // [byte 0]: [ssa num 0] (8 bits) | ||
| // We expect this encoding to cover the 99%+ case of composite names: locals with more | ||
| // than 127 defs, maximum for this encoding, are rare, and the current limit on the count |
There was a problem hiding this comment.
is 127 a limit for number of fields for the local or for number of all locals in the method? I expect that if we have 200 locals we would have some SsaNum > 200 and the compact encoding won't work or am I wrong?
There was a problem hiding this comment.
is 127 a limit for number of fields for the local or for number of all locals in the method?
It is the number of SSA definitions, per a field local. So, to not fit into this limit would require code like below:
s.FieldOne = 1;
...
s.FieldOne = 2;
...
// 125 more definitions
...
s.FieldOne = 128;(Or complex flow, because of PHIs)
| // This can be in one of four states: | ||
| // 1. Single SSA name: > RESERVED_SSA_NUM (0). | ||
| // 2. RESERVED_SSA_NUM (0) | ||
| // 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED): |
There was a problem hiding this comment.
It would look a bit awkward because both 3 and 4 are < 0, so I left it out.
| // 2. RESERVED_SSA_NUM (0) | ||
| // 3. "Inline composite name": packed SSA numbers of field locals (each could be RESERVED): | ||
| // [byte 3]: [top bit][ssa num 3] (7 bits) | ||
| // [byte 2]: [ssa num 2] (8 bits) |
There was a problem hiding this comment.
do we have 2 spare bits in byte2 and byte0?
There was a problem hiding this comment.
Yep. Since 127 definitions is already a lot, I decided being more precise was not worth it.
Up until this change, it has been the case that an SSA ref had to have 1-to-1 relationship with the node it was attached to (with the exception of
CanBeReplacedWithItsFieldstructs). This was a problem for modeling assignments to promoted structs that are not decomposable, i. e. made by calls (multi-reg or not). The solution has been to exclude the promoted fields from SSA and VN, which is a CQ problem.This change solves this by allowing for encoding the SSA number that would represent multiple field definitions at once. The code comment describing the format is quite nice, I will copy it here:
The rest of changes are fairly mechanical: adding support for "composite" definitions where necessary: SSA, VN, copy propagation.
While the encoding does support creating not just composite defs, but composite uses, I do not expect us to take advantage of this, due to the complexity tradeoff not being there. Fortunately, unlike with defs, "missing" a use is not a correctness problem.
No code diffs, small (<
0.1%) TP regression, with a bit more substantial memory regression (~0.2%).What's up with the TP improvements on x86?
Apparently,
DefinesLocalwas particularly expensive there:The benefit is we get to model multi-reg assignments and get rid of the
CanBeReplacedWithItsFieldspecial case. Expected improvements are in the-9Krange for the benchmarks collection on ARM64, which seems worth the TP cost.The complexity cost is high, but not "terribly" high. I am open to the opinion we shouldn't pursue this altogether, though.
Closes #41242.