Add compiler-generated code dataflow analysis by sbomer · Pull Request #2842 · dotnet/linker

sbomer · 2022-06-16T00:33:35Z

~~Still a work in progress, but feel free to take an early look!~~

edit: After some discussion I changed the approach. See old notes below for the earlier approach.

This treats fields of display classes and state machines as hoisted locals. We track all assignments to hoisted locals within a method group (the set of compiler-generated code reachable from a given user method). The analysis is technically flow-insensitive, because it assumes that any assignment to a hoisted local can reach any read of the same local. This will produce extra warnings in some cases, but it closes the holes in the original approach:

State will "flow" out of nested functions. That is, writes (to hoisted locals) within nested functions will reach reads in the enclosing user method
Lambdas are analyzed at the point of delegate conversion, but with all possible states of captured variables. So effectively, writes after the lambda declaration will reach reads within the lambda.

Previously, hoisted locals were treated as unannotated, so they would produce dataflow warnings if they reached an annotated location. Now that we analyze hoisted locals, cases where the value satisfies requirements at the point of consumption won't warn. This means that accessing these fields (representing hoisted locals) via reflection is problematic, since it could mutate the values of these fields and invalidate the correctness analysis. For this reason we now warn on reflection access to compiler-generated fields.

To prevent noise, we only warn for reflection access to compiler-generated fields that represent types which may be annotated - so Type, string, etc. - but not int. This is technically a hole because ints also participate in dataflow analysis but per discussion with @vitek-karas I think we agree that this is an ok tradeoff.

Old notes:

This flows state through hoisted locals in state machine methods, and into nested functions (lambdas/local functions). Some caveats to be aware of:

Lambdas are analyzed with the state of captured variables at the point of delegate conversion, not at the point of invocation (which would require tracking delegates as dataflow values). This is an analysis hole in some cases. It follows the model for nullable analysis, which has an analysis hole in this case for lambdas but not for local functions: https://sharplab.io/#v2:EYLgtghglgdgNAFxFANnAJiA1AHwAIBMAjALABQAxDAK4ooTAoCmABEzA8+XgMwuEsAwiwDe5FhP588AFhYBZABQBKUeMkbBAfhYBjFgF4WMJgHchKgNzqNEvEQIsUhlisMA+NWVs/+RAJyKugB0ACoA9gDKCABOsADmKsrW3r4Avim+LDa2siwAMuG6ECh4AKwIUOEwbgae9oEhEdFxMInKyTkaXZL6RjR0mb4oVj0ShcWlFVU1nakSaeSLZEA=
This doesn't track state flowing out of nested functions. Again, similar to nullable: https://sharplab.io/#v2:EYLgtghglgdgNAFxFANnAJiA1AHwAIBMAjALABQAxDAK4ooTAoCmABEzA8+XgMwuEsAwiwDe5FhP588AFhYBZABQBKUeMkbBAfhYBjFgF4WMJgHchKgNzqNEvEQIsUhlisMA+NWVs+9LmnTW3r4AvkG+LDa2siwAMgD2uhAoeACsCFDxMG4GnvpGAShBURooViWSCUkp6ZnZyuG+9gCciroAdAAq8QDKCABOsADmKg1RIeQTZEA=.

@vitek-karas @agocke @MichalStrehovsky I'm curious to hear your opinion on whether we should try to plug those holes, or if this approach is good enough.

…flow

src/linker/Linker/CompilerGeneratedState.cs

src/ILLink.Shared/DataFlow/DefaultValueDictionary.cs

src/ILLink.Shared/DataFlow/MaybeLattice.cs

src/linker/Linker.Dataflow/MethodBodyScanner.cs

…flow

- Add comments - Inline defaultValue - Treat all fields on compiler-generated types as hoisted locals

Also add more tests to cover these cases, and hoisted method parameters in state machine methods.

- Use suppressions in suppression test - Remove unused method

sbomer · 2022-06-28T17:56:49Z

@vitek-karas I think this is ready - PTAL! @jtschuster I would also appreciate your review since you have experience with the locals tracking logic.

jtschuster

LGTM, Thank you!

src/linker/Linker.Dataflow/InterproceduralState.cs

src/linker/Linker.Dataflow/MethodBodyScanner.cs

src/linker/Linker.Steps/MarkStep.cs

test/Mono.Linker.Tests.Cases/DataFlow/CompilerGeneratedCodeDataflow.cs

src/linker/Linker.Dataflow/MethodBodyScanner.cs

- Fix formatting - Remove invalid assert - Remove unused code - Inline unneeded local variables - Add another test showing imprecise warnings

vitek-karas · 2022-06-29T08:53:41Z

Regarding the potential holes with reflection access to closure fields:

I'm mostly worried about noise - we've seen cases where All is applied to a type, if this type uses state machines/closures anywhere all of the fields on those subtypes will be marked with "Reflection access" and can generate warnings. This is often counter intuitive and hard to diagnose where the warning came from (and why), and also is tricky to fix - suppressions are problematic (especially since they would typically apply to entire types) and real fix is basically impossible.
With that said - reducing noise from this should be a relatively high priority. Personally I would not mind if we leave some holes behind in favor of reducing noise - ideally with a plan how to "plug" them.
So the int hole - I'm perfectly OK with that.
We need to test this, we might need to ignore string as well (even though it can carry annotations) - again to reduce noise.
The reason I'm OKish leaving holes in this - It is VERY unlikely the reflection will be actually exercised at runtime and even less so to modify the field's value. And honestly, if somebody is doing private reflection on compiler generated code with modifications... I'm fine if that's the one case where we break them.
Going forward if this becomes a sore point we can do a lot better by:
- Add the ability to track access to values (fields, parameters) during data flow analysis - basically being able to determine which values where used to make marking/warning decisions. This is relatively easy, just needs a good C# model to make it easy to write and reliable.
- Once we have that we would use that to determine which of the hoisted fields are actually interesting for data flow - not just a heuristic based on their type. And then we would warn on reflection access only to those fields. Such warnings would greatly reduce the noise ratio (but still relatively high - as noted above I doubt we will see code which actually wants to reflect over the compiler generated stuff).

src/linker/Linker.Dataflow/HoistedLocalKey.cs

src/linker/Linker.Dataflow/InterproceduralState.cs

vitek-karas · 2022-06-29T09:02:25Z

src/linker/Linker.Dataflow/MethodBodyScanner.cs

-			HashSet<MethodDefinition> scannedMethods = new HashSet<MethodDefinition> ();
+			// Note that the default value of a hoisted local will be MultiValueLattice.Top, not UnknownValue.Instance.
+			// This ensures that there are no warnings for the "unassigned state" of a parameter.
+			// Definite assignment should ensure that there is no way for this to be an analysis hole.


This might hold for the analyzer, but doesn't hold for linker (there are other compilers producing IL which may not implement definite assignment).
I'm fine with the current behavior for now, but we should probably file an issue on this at least.

That said it's probably not a big problem:
Runtime behavior of such case (unless we get something wrong) is that the field will have its default value (so typically null) - which typically means the analysis will ignore/skip it anyway.

Filed #2871

src/linker/Linker.Dataflow/MethodBodyScanner.cs

src/linker/Linker.Steps/MarkStep.cs

- Add comment on HoistedLocalKey - Track method bodies in interprocedural state - Make interprocedural state lattice a static proprety - Add type hierarchy warnings for compiler-generated fields - Optimize warning logic to avoid extra annotation checks

- Track state machine methods in interprocedural state, not in scan.

src/linker/Linker.Steps/MarkStep.cs

vitek-karas · 2022-06-29T19:46:36Z

src/linker/Linker.Steps/MarkStep.cs

+			if (member is MethodDefinition method && ShouldWarnForReflectionAccessToCompilerGeneratedCode (method, warnsOnRUC, warnsOnReflection)) {
+				var id = reportOnMember ? DiagnosticId.DynamicallyAccessedMembersOnTypeReferencesCompilerGeneratedMember : DiagnosticId.DynamicallyAccessedMembersOnTypeReferencesCompilerGeneratedMemberOnBase;
+				Context.LogWarning (origin, id, type.GetDisplayName (), method.GetDisplayName ());
+			}


This needs another comment which basically says that the skipWarningsForOverride=true can only happen on MoveNext like methods in compiler generated code, that it will never happen for local functions and alike. Meaning that it's never going to happen that we decided to not warn above and would also skip warning here.
Reason: Technically the logic here is wrong, the warnsOnReflection is true regardless if we produced warning above or not (due to override auto-suppression) and that will suppress producing the compiler-generated warning here. But it will never actually matter. So I'm fine keeping the code as-is, but I think this needs good explanation.

I added a comment and also tried to make this more clear by renaming the variables to isReflectionAccessCoveredByRUC/DAM. There's still a naming issue because the skipWarningsForOverride check isn't part of ShouldWarnWhenAccessedForReflection (so ShouldWarn says true, but then we still might not warn) - but I don't want to clean all of that up here.

Looks good - thanks!

src/linker/Linker.Steps/MarkStep.cs

vitek-karas

Can't wait for all of this light up :-)

- Add more detaled comments about virtual override logic for type hierarchy marking and interactions with the compiler-generated code warnings - Clean up related code to make it slightly clearer

This treats fields of display classes and state machines as hoisted locals. We track all assignments to hoisted locals within a method group (the set of compiler-generated code reachable from a given user method). The analysis is technically flow-insensitive, because it assumes that any assignment to a hoisted local can reach any read of the same local. This will produce extra warnings in some cases, but it prevents holes: - State will "flow" out of nested functions. That is, writes (to hoisted locals) within nested functions will reach reads in the enclosing user method - Lambdas are analyzed at the point of delegate conversion, but with all possible states of captured variables. So effectively, writes after the lambda declaration will reach reads within the lambda. Previously, hoisted locals were treated as unannotated, so they would produce dataflow warnings if they reached an annotated location. Now that we analyze hoisted locals, cases where the value satisfies requirements at the point of consumption won't warn. This means that accessing these fields (representing hoisted locals) via reflection is problematic, since it could mutate the values of these fields and invalidate the correctness analysis. For this reason we now warn on reflection access to compiler-generated fields. To prevent noise, we only warn for reflection access to compiler-generated fields that represent types which may be annotated - so Type, string, etc. - but not int. This is technically a hole because ints also participate in dataflow analysis but we are choosing this tradeoff to avoid excess warnings for integers. This also includes some cleanup of the type hierarchy logic and extra comments to make it more clear how this interacts with warnings for reflection access to compiler-generated code. Commit migrated from dotnet/linker@bc46e44

sbomer added 8 commits June 15, 2022 15:44

Add hoisted local store

e5e27e6

Scan until convergence, add more tests

eb3549e

Lint

9b5c79c

Remove unused LocalKey

7f75b92

Fix some failures

be1d1ef

Fix nullref and tests for analyzer

a40119d

Merge remote-tracking branch 'origin/main' into compilerGeneratedData…

4108841

…flow

Cleanup

8f2812e

sbomer marked this pull request as ready for review June 20, 2022 17:36

sbomer requested a review from marek-safar as a code owner June 20, 2022 17:36

sbomer changed the title ~~[WIP] Add compiler-generated code dataflow analysis~~ Add compiler-generated code dataflow analysis Jun 20, 2022

sbomer added 2 commits June 20, 2022 10:46

Show analysis hole in write tests

cbf9059

Fix bad merge and analyzer testcase

af3689b

sbomer requested review from MichalStrehovsky, agocke, jtschuster and vitek-karas June 20, 2022 17:57

vitek-karas reviewed Jun 22, 2022

View reviewed changes

Merge remote-tracking branch 'origin/main' into compilerGeneratedData…

b9f0287

…flow

This was referenced Jun 23, 2022

Support tracking values written to parameters dotnet/runtime#117171

Open

Correctly perform analysis for compiler-generated methods #2001

Closed

sbomer added 5 commits June 27, 2022 09:57

PR feedback

e81c36d

- Add comments - Inline defaultValue - Treat all fields on compiler-generated types as hoisted locals

Track Meet of values for hoisted locals

bf65e96

Also add more tests to cover these cases, and hoisted method parameters in state machine methods.

Fix test for analyzer

0c546df

Cleanup

a3360df

- Use suppressions in suppression test - Remove unused method

Warn on reflection access to generated fields

18c981f

sbomer requested a review from vitek-karas June 28, 2022 17:55

jtschuster approved these changes Jun 28, 2022

View reviewed changes

jtschuster reviewed Jun 28, 2022

View reviewed changes

src/linker/Linker.Dataflow/MethodBodyScanner.cs Outdated Show resolved Hide resolved

PR feedback

44015a3

- Fix formatting - Remove invalid assert - Remove unused code - Inline unneeded local variables - Add another test showing imprecise warnings

vitek-karas reviewed Jun 29, 2022

View reviewed changes

PR feedback

9e4dfe3

- Add comment on HoistedLocalKey - Track method bodies in interprocedural state - Make interprocedural state lattice a static proprety - Add type hierarchy warnings for compiler-generated fields - Optimize warning logic to avoid extra annotation checks

sbomer mentioned this pull request Jun 29, 2022

Compiler-generated code analysis relies on definite assignment #2871

Open

PR feedback

28e5c0f

- Track state machine methods in interprocedural state, not in scan.

sbomer requested a review from vitek-karas June 29, 2022 18:24

vitek-karas reviewed Jun 29, 2022

View reviewed changes

src/linker/Linker.Steps/MarkStep.cs Outdated Show resolved Hide resolved

vitek-karas reviewed Jun 29, 2022

View reviewed changes

src/linker/Linker.Steps/MarkStep.cs Outdated Show resolved Hide resolved

vitek-karas approved these changes Jun 29, 2022

View reviewed changes

PR feedback

0830321

- Add more detaled comments about virtual override logic for type hierarchy marking and interactions with the compiler-generated code warnings - Clean up related code to make it slightly clearer

sbomer merged commit bc46e44 into dotnet:main Jun 29, 2022

This was referenced Jul 13, 2022

Analyze nested functions in Roslyn analyzer #2892

Merged

Warning in code executed by lambda #1981

Closed

Conversation

sbomer commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sbomer commented Jun 28, 2022

Uh oh!

jtschuster left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vitek-karas commented Jun 29, 2022

Uh oh!

Uh oh!

Uh oh!

vitek-karas Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

sbomer Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vitek-karas Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

sbomer Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

vitek-karas Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vitek-karas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbomer commented Jun 16, 2022 •

edited

Loading