Skip to content

Add compiler-generated code dataflow analysis#2842

Merged
sbomer merged 20 commits intodotnet:mainfrom
sbomer:compilerGeneratedDataflow
Jun 29, 2022
Merged

Add compiler-generated code dataflow analysis#2842
sbomer merged 20 commits intodotnet:mainfrom
sbomer:compilerGeneratedDataflow

Conversation

@sbomer
Copy link
Copy Markdown
Member

@sbomer sbomer commented Jun 16, 2022

Still a work in progress, but feel free to take an early look!

edit: After some discussion I changed the approach. See old notes below for the earlier approach.

This treats fields of display classes and state machines as hoisted locals. We track all assignments to hoisted locals within a method group (the set of compiler-generated code reachable from a given user method). The analysis is technically flow-insensitive, because it assumes that any assignment to a hoisted local can reach any read of the same local. This will produce extra warnings in some cases, but it closes the holes in the original approach:

  • State will "flow" out of nested functions. That is, writes (to hoisted locals) within nested functions will reach reads in the enclosing user method
  • Lambdas are analyzed at the point of delegate conversion, but with all possible states of captured variables. So effectively, writes after the lambda declaration will reach reads within the lambda.

Previously, hoisted locals were treated as unannotated, so they would produce dataflow warnings if they reached an annotated location. Now that we analyze hoisted locals, cases where the value satisfies requirements at the point of consumption won't warn. This means that accessing these fields (representing hoisted locals) via reflection is problematic, since it could mutate the values of these fields and invalidate the correctness analysis. For this reason we now warn on reflection access to compiler-generated fields.

To prevent noise, we only warn for reflection access to compiler-generated fields that represent types which may be annotated - so Type, string, etc. - but not int. This is technically a hole because ints also participate in dataflow analysis but per discussion with @vitek-karas I think we agree that this is an ok tradeoff.


Old notes:

This flows state through hoisted locals in state machine methods, and into nested functions (lambdas/local functions). Some caveats to be aware of:

@vitek-karas @agocke @MichalStrehovsky I'm curious to hear your opinion on whether we should try to plug those holes, or if this approach is good enough.

@sbomer sbomer marked this pull request as ready for review June 20, 2022 17:36
@sbomer sbomer requested a review from marek-safar as a code owner June 20, 2022 17:36
@sbomer sbomer changed the title [WIP] Add compiler-generated code dataflow analysis Add compiler-generated code dataflow analysis Jun 20, 2022
sbomer added 5 commits June 27, 2022 09:57
- Add comments
- Inline defaultValue
- Treat all fields on compiler-generated types as hoisted locals
Also add more tests to cover these cases, and hoisted method parameters
in state machine methods.
- Use suppressions in suppression test
- Remove unused method
@sbomer sbomer requested a review from vitek-karas June 28, 2022 17:55
@sbomer
Copy link
Copy Markdown
Member Author

sbomer commented Jun 28, 2022

@vitek-karas I think this is ready - PTAL! @jtschuster I would also appreciate your review since you have experience with the locals tracking logic.

Copy link
Copy Markdown
Member

@jtschuster jtschuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you!

- Fix formatting
- Remove invalid assert
- Remove unused code
- Inline unneeded local variables
- Add another test showing imprecise warnings
@vitek-karas
Copy link
Copy Markdown
Member

Regarding the potential holes with reflection access to closure fields:

  • I'm mostly worried about noise - we've seen cases where All is applied to a type, if this type uses state machines/closures anywhere all of the fields on those subtypes will be marked with "Reflection access" and can generate warnings. This is often counter intuitive and hard to diagnose where the warning came from (and why), and also is tricky to fix - suppressions are problematic (especially since they would typically apply to entire types) and real fix is basically impossible.
  • With that said - reducing noise from this should be a relatively high priority. Personally I would not mind if we leave some holes behind in favor of reducing noise - ideally with a plan how to "plug" them.
  • So the int hole - I'm perfectly OK with that.
  • We need to test this, we might need to ignore string as well (even though it can carry annotations) - again to reduce noise.
  • The reason I'm OKish leaving holes in this - It is VERY unlikely the reflection will be actually exercised at runtime and even less so to modify the field's value. And honestly, if somebody is doing private reflection on compiler generated code with modifications... I'm fine if that's the one case where we break them.
  • Going forward if this becomes a sore point we can do a lot better by:
    • Add the ability to track access to values (fields, parameters) during data flow analysis - basically being able to determine which values where used to make marking/warning decisions. This is relatively easy, just needs a good C# model to make it easy to write and reliable.
    • Once we have that we would use that to determine which of the hoisted fields are actually interesting for data flow - not just a heuristic based on their type. And then we would warn on reflection access only to those fields. Such warnings would greatly reduce the noise ratio (but still relatively high - as noted above I doubt we will see code which actually wants to reflect over the compiler generated stuff).

HashSet<MethodDefinition> scannedMethods = new HashSet<MethodDefinition> ();
// Note that the default value of a hoisted local will be MultiValueLattice.Top, not UnknownValue.Instance.
// This ensures that there are no warnings for the "unassigned state" of a parameter.
// Definite assignment should ensure that there is no way for this to be an analysis hole.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might hold for the analyzer, but doesn't hold for linker (there are other compilers producing IL which may not implement definite assignment).
I'm fine with the current behavior for now, but we should probably file an issue on this at least.

That said it's probably not a big problem:
Runtime behavior of such case (unless we get something wrong) is that the field will have its default value (so typically null) - which typically means the analysis will ignore/skip it anyway.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2871

- Add comment on HoistedLocalKey
- Track method bodies in interprocedural state
- Make interprocedural state lattice a static proprety
- Add type hierarchy warnings for compiler-generated fields
- Optimize warning logic to avoid extra annotation checks
- Track state machine methods in interprocedural state,
  not in scan.
@sbomer sbomer requested a review from vitek-karas June 29, 2022 18:24
Comment on lines +1654 to +1657
if (member is MethodDefinition method && ShouldWarnForReflectionAccessToCompilerGeneratedCode (method, warnsOnRUC, warnsOnReflection)) {
var id = reportOnMember ? DiagnosticId.DynamicallyAccessedMembersOnTypeReferencesCompilerGeneratedMember : DiagnosticId.DynamicallyAccessedMembersOnTypeReferencesCompilerGeneratedMemberOnBase;
Context.LogWarning (origin, id, type.GetDisplayName (), method.GetDisplayName ());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs another comment which basically says that the skipWarningsForOverride=true can only happen on MoveNext like methods in compiler generated code, that it will never happen for local functions and alike. Meaning that it's never going to happen that we decided to not warn above and would also skip warning here.
Reason: Technically the logic here is wrong, the warnsOnReflection is true regardless if we produced warning above or not (due to override auto-suppression) and that will suppress producing the compiler-generated warning here. But it will never actually matter. So I'm fine keeping the code as-is, but I think this needs good explanation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment and also tried to make this more clear by renaming the variables to isReflectionAccessCoveredByRUC/DAM. There's still a naming issue because the skipWarningsForOverride check isn't part of ShouldWarnWhenAccessedForReflection (so ShouldWarn says true, but then we still might not warn) - but I don't want to clean all of that up here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks!

Copy link
Copy Markdown
Member

@vitek-karas vitek-karas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait for all of this light up :-)

- Add more detaled comments about virtual override logic
  for type hierarchy marking and interactions with the
  compiler-generated code warnings
- Clean up related code to make it slightly clearer
@sbomer sbomer merged commit bc46e44 into dotnet:main Jun 29, 2022
agocke pushed a commit to dotnet/runtime that referenced this pull request Nov 16, 2022
This treats fields of display classes and state machines as hoisted
locals. We track all assignments to hoisted locals within a method
group (the set of compiler-generated code reachable from a given user
method). The analysis is technically flow-insensitive, because it
assumes that any assignment to a hoisted local can reach any read of
the same local. This will produce extra warnings in some cases, but it
prevents holes:

- State will "flow" out of nested functions. That is, writes (to
  hoisted locals) within nested functions will reach reads in the
  enclosing user method
- Lambdas are analyzed at the point of delegate conversion, but with
  all possible states of captured variables. So effectively, writes
  after the lambda declaration will reach reads within the lambda.

Previously, hoisted locals were treated as unannotated, so they would
produce dataflow warnings if they reached an annotated location. Now
that we analyze hoisted locals, cases where the value satisfies
requirements at the point of consumption won't warn. This means that
accessing these fields (representing hoisted locals) via reflection is
problematic, since it could mutate the values of these fields and
invalidate the correctness analysis. For this reason we now warn on
reflection access to compiler-generated fields.

To prevent noise, we only warn for reflection access to
compiler-generated fields that represent types which may be
annotated - so Type, string, etc. - but not int. This is technically a
hole because ints also participate in dataflow analysis but we are
choosing this tradeoff to avoid excess warnings for integers.

This also includes some cleanup of the type hierarchy logic and
extra comments to make it more clear how this interacts with
warnings for reflection access to compiler-generated code.

Commit migrated from dotnet/linker@bc46e44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants