Stack frame size optimisations by MykolaBalakin · Pull Request #44854 · dotnet/roslyn

MykolaBalakin · 2020-06-04T16:05:50Z

The changes contains a few optimisations based on EndToEndTests.DeeplyNestedGeneric test.
Below are the results of the test run:

OS	Framework	Configuration	Before	After
macOS	netcoreapp3.1	Debug	180	200
macOS	netcoreapp3.1	Release	440	530
macOS	net472	Debug	870	960
macOS	net472	Release	1230	1390
Windows x64	netcoreapp3.1	Debug	500	560
Windows x64	netcoreapp3.1	Release	1200	1440
Windows x32	net472	Debug	480	520
Windows x32	net472	Release	1480	1560

Moving the code to a local function reduces stack frame size of BindQualifiedName by 80 bytes

This change reduces the BindNameSpaceOrTypeSymbol method's stack frame size by 32 bytes

Reduces the stack frame size by 64 bytes

…om the caller Reduces BindQualifiedName method's stack frame size by 32 bytes

MykolaBalakin

Should I increase the limits defined or add macOS Release configuration limit in the EndToEndTests.DeeplyNestedGeneric test?

jaredpar · 2020-06-04T16:43:32Z

Should I increase the limits defined or add macOS Release configuration limit in the EndToEndTests.DeeplyNestedGeneric test?

Absolutely. Want to lock in this win.

jaredpar · 2020-06-04T16:44:34Z

src/Compilers/CSharp/Portable/Binder/Binder_Symbols.cs

                return TypeWithAnnotations.Create(new PointerTypeSymbol(elementType));
            }
+
+            NamespaceOrTypeOrAliasSymbolWithAnnotations createErrorType()


Nice. This is a simple and effective technique I'm guessing we could replicate elsewhere.

Curious what tools, if any, did you use to measure the stack frame sizes here? Or did you just experiment and run the tests?

Just lldb, sos and clru. Have not found nothing better yet.

All the other case statements of the switch-case above may lead to recursive code execution, so similar changes there may get other cases worse (even while improving the current one).

Just lldb, sos and clru. Has not found nothing better yet.

That's basically the same approach we'd been taking (windbg + sos).

gafter

Nice work. Looks good, but please do update the expected depth for the test.

CyrusNajmabadi · 2020-06-04T18:26:02Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

+                            constraintsList,
                            openBrace,
-                            members,
+                            membersList,


@jaredpar it's a super pity this sort of change needs to be manual. i suppose it's unsafe for the compielr to figure this out due to potential side-effects?

shoudl we get rid of those implicit conversions between the builders and the built items? so that we have to think about this and can potentially get improvements in other areas of the parser.

CyrusNajmabadi · 2020-06-04T18:26:44Z

src/Compilers/CSharp/Portable/Parser/LanguageParser.cs

                var semicolon = TryEatToken(SyntaxKind.SemicolonToken);
+                var modifiersList = (SyntaxList<SyntaxToken>)modifiers.ToList();
+                var membersList = (SyntaxList<MemberDeclarationSyntax>)members;
+                var constraintsList = (SyntaxList<TypeParameterConstraintClauseSyntax>)constraints;


can we explicitly doc this? this would be something that could easily be done later by someone well meaning. i.e. we might get a stack-improvement elsewhere in the future. then someone goes "i'll just inline these" and now we undo it, but nothing catches it.

I'd say that is a point for improvement for JIT-compiler. JIT-compiler may reuse stack frame space for different variables by analysing variable life scope.

How is this enabling us to save stack space here? This is essentially converting a sequence of implicit conversions that would cause some push call sequences into ldloc calls. The final stack size should be identical particularly because these aren't the final arguments to the method they are passed as parameters too.

Is this a case where the JIT is over allocating stack space for the method and this just helps their analysis? The method is fairly large and does have a lot of locals. Possibly it's over the range where they do their most in depth optimizations.

That or I'm missing something really basic here :)

Each case-branch contains a set of cast operator invocations and JIT places the result of the cast into different addresses in the stack frame. Aggregating all the casts before the switch eliminates the "duplicates".

; case SyntaxKind.ClassKeyword: ; ... mov rdi, qword ptr [rbp - 0xa8] call 0x1127621a0 (SyntaxListBuilder.ToListNode()) ; ... mov qword ptr [rbp - 0xc0], rax ; case SyntaxKind.StructKeyword: ; ... mov rdi, qword ptr [rbp - 0xa8] call 0x1127621a0 (SyntaxListBuilder.ToListNode()) ; ... mov qword ptr [rbp - 0xb8], rax ; case SyntaxKind.InterfaceKeyword: ; ... mov rdi, qword ptr [rbp - 0xa8] call 0x1127621a0 (SyntaxListBuilder.ToListNode()) ; ... mov qword ptr [rbp - 0xb0], rax

very interesting!

Very interesting indeed.

This doesn't seem to happen with all calls in a switch / case but I was able to get it to repro in some smaller samples. That makes it hard to make a simple judgement on at code review time but we also now know to look for this when we're investigating stack frame bugs / making changes in this area. So definitely can improve our process going forward.

ghost

Auto-approval

…usage-optimization

MykolaBalakin · 2020-06-08T13:48:13Z

src/Compilers/CSharp/Test/Emit/Emit/EndToEndTests.cs

-                (ExecutionArchitecture.x86, ExecutionConfiguration.Release) => 1290,
-                (ExecutionArchitecture.x64, ExecutionConfiguration.Debug) => 170,
-                (ExecutionArchitecture.x64, ExecutionConfiguration.Release) => 730,
+                (ExecutionArchitecture.x86, ExecutionConfiguration.Debug) => 520,


Those are the maximum working values tested on VS 2019 VM image in Azure. I'm wondering maybe it's worth to reserve some margin so that different runtime or some random stuff does not make the test fail.

Also I have some troubles with measuring Linux limits. It works with the limit over 3000 and takes about 40 minutes to pass.

MykolaBalakin · 2020-06-09T19:06:50Z

src/Compilers/CSharp/Test/Emit/Emit/EndToEndTests.cs

-                (ExecutionArchitecture.x86, ExecutionConfiguration.Release) => 1640,
-                (ExecutionArchitecture.x64, ExecutionConfiguration.Debug) => 290,
-                (ExecutionArchitecture.x64, ExecutionConfiguration.Release) => 810,
+                (ExecutionArchitecture.x86, ExecutionConfiguration.Debug) => 460,


What is the reference environment to run this test?

Running xunit.console.exe or xunit.console.x86.exe on my home PC.

Well, that is how I determined new baselines. Perhaps we could do it in a more "canonical" environment, e.g. by lowering all the baselines, uncommenting the bits that find the new baseline, then running it on a CI machine.

The numbers set were from the test run on Azure VM (VS 2019 Latest image) using ./eng/cibuild.ps1. I thought that would be a "canonical" enough method to measure the baselines. 😄
Also, what's interesting, running the commented loop leads to different number than running the test itself while trying to adjust the number with the loop commented. Maybe will check why it is so later.

Yeah, and the question about Linux environment is still actual. Do we want to adjust the number for it? The real issue there is the test duration which reaches 40 minutes.

IMO, there is no need to test the actual limit on Linux. Keeping at around the desktop Release/x86 level or a bit higher is fine.

BTW, the numbers mentioned in the PR description were measured by running dotnet test.

I also noticed that the thresholds in the test differ based on simply whether you use the loop or not. It is a bit of a pain.

The most likely reason your thresholds differed from mine is that some features have gone into master since this PR was opened, most notably records.

Interesting enough, they differ in different directions. For macOS the working threshold was lower than the one measured using the loop and for Windows it's vice versa.

jaredpar · 2020-06-09T23:27:08Z

Thanks for this contribution! This is really exciting for us to see.

Apply #44854 to preview3 branch

MykolaBalakin added 5 commits June 3, 2020 14:21

Introduce local function to reduce stack usage

047af13

Moving the code to a local function reduces stack frame size of BindQualifiedName by 80 bytes

Pass readonly structure by ref to reduce caller stack frame size

ba0920f

This change reduces the BindNameSpaceOrTypeSymbol method's stack frame size by 32 bytes

Introduce local function to reduce stack usage

6f26a99

Reduces the stack frame size by 64 bytes

Move a few varialbes out of case statements to reduce stack frame size

1ccf547

Reduces the stack frame size by 64 bytes

Introduce ReportDiagnosticsIfObsolete overload to extract the cast fr…

146ac32

…om the caller Reduces BindQualifiedName method's stack frame size by 32 bytes

MykolaBalakin requested a review from a team as a code owner June 4, 2020 16:05

MykolaBalakin commented Jun 4, 2020

View reviewed changes

jaredpar reviewed Jun 4, 2020

View reviewed changes

jaredpar added Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Jun 4, 2020

gafter reviewed Jun 4, 2020

View reviewed changes

CyrusNajmabadi reviewed Jun 4, 2020

View reviewed changes

CyrusNajmabadi approved these changes Jun 4, 2020

View reviewed changes

Increase recursion test limits

1e0e919

jaredpar approved these changes Jun 8, 2020

View reviewed changes

cston approved these changes Jun 8, 2020

View reviewed changes

MykolaBalakin added 2 commits June 8, 2020 20:45

Merge branch 'master' into balakin/stack-usage-optimization

ab0f335

Fix merge commit mistake

7d79ac7

jaredpar added the auto-merge label Jun 8, 2020

ghost approved these changes Jun 8, 2020

View reviewed changes

agocke mentioned this pull request Jun 9, 2020

Fix stack increase required for parsing records #44863

Closed

RikkiGibson added 2 commits June 9, 2020 11:25

Merge branch 'master' of github.com:dotnet/roslyn into balakin/stack-…

85cdc5c

…usage-optimization

Adjust Desktop baselines

06fee5d

MykolaBalakin commented Jun 9, 2020

View reviewed changes

RikkiGibson merged commit 71b5c58 into dotnet:master Jun 9, 2020

ghost added this to the Next milestone Jun 9, 2020

RikkiGibson added a commit to RikkiGibson/roslyn that referenced this pull request Jun 9, 2020

Apply dotnet#44854 to preview3 branch

7e4665d

RikkiGibson mentioned this pull request Jun 9, 2020

Apply #44854 to preview3 branch #45011

Merged

RikkiGibson added a commit that referenced this pull request Jun 10, 2020

Merge pull request #45011 from RikkiGibson/fix-stack-p3

7e3dd03

Apply #44854 to preview3 branch

dibarbet modified the milestones: Next, 16.7.P4 Jun 30, 2020

Conversation

MykolaBalakin commented Jun 4, 2020

Uh oh!

MykolaBalakin left a comment

Choose a reason for hiding this comment

Uh oh!

jaredpar commented Jun 4, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MykolaBalakin Jun 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gafter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MykolaBalakin Jun 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MykolaBalakin Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MykolaBalakin Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaredpar commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

MykolaBalakin Jun 4, 2020 •

edited

Loading

MykolaBalakin Jun 5, 2020 •

edited

Loading

MykolaBalakin Jun 9, 2020 •

edited

Loading

MykolaBalakin Jun 9, 2020 •

edited

Loading