Switch pattern matching to use TemporaryArray#51035
Switch pattern matching to use TemporaryArray#51035CyrusNajmabadi merged 23 commits intodotnet:mainfrom
Conversation
| { | ||
| private readonly bool _includeMatchedSpans; | ||
| private readonly string _candidate; | ||
| private readonly ArrayBuilder<TextSpan> _candidateHumps; |
There was a problem hiding this comment.
this was annoying. can't point to a struct by ref as a field. So this had to be a parameter passed to all methods instead.
There was a problem hiding this comment.
I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>> or Tuple<>/class, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.
sharwell
left a comment
There was a problem hiding this comment.
Can you do a pass to switch from ref to in for the new parameters in cases where they are not modified?
Yup. Brainfart on my part. Thanks! |
|
@sharwell ready for another view. |
| /// character spans, one for AA and one for BB. | ||
| /// </summary> | ||
| public readonly ArrayBuilder<TextSpan> PatternHumps; | ||
| public TemporaryArray<TextSpan> PatternHumps; |
There was a problem hiding this comment.
❗ This is a hole in the non-copyable analysis. Can't have a non-copyable field of a copyable structure. Need to either mark TextChunk as non-copyable, or remove this field.
There was a problem hiding this comment.
note: is this actually bad here? i get the potential hole in teh analysis, but in this case the usage is all correct. this data is effectively readonly (i would actually make it readonly if i could, but the analyzer gets annoyed in other ways). So as long as the text chunk is kept around, it's safe to use. THen, we end up disposing the TextChunk when no longer needed, which frees the PatternHumps.
There was a problem hiding this comment.
note: is this actually bad here?
Yes. Non-copyable analysis is intended to only allow false positives (might be copied but wasn't) as opposed to approximation features like NRT that also allow false negatives. [NonCopyable] provides the reader with an absolute correctness guarantee that can be relied on without any additional knowledge of code outside the current context.
There was a problem hiding this comment.
that's fair. do you have a sense of how the design shoudl work here? For pattern matching we have a search pattern (that we want to break into a small, finite amount of pre-processed info that we likely can keep on the stack). We then compare against candidates that we similarly break up.
Ideally this coudl almost all be on the stack. The only exceptions being when breaking up that info literally does require more than teh fixed amount of the TempArray and has to spill to the heap.
One interesting thing though is that there are a few arrays-of-arrays going on. FOr example a pattern-segment has an array of text-chunks. And teach text-chunk has an array of camel-humps.
DO you think there's a way to rejigger this to work with your stack-affinitized approach currently? If not, do you see the work coming in the next C# (for example, ref-fields) as helping out here?
Thanks!
There was a problem hiding this comment.
Ideally this coudl almost all be on the stack.
The storage location is generally irrelevant. We wouldn't, for example, want to use ref struct for any of this. [NonCopyable] is the intended abstraction (data can exist anywhere, as long as it's only in one location at any given time).
I see at two paths forward that would offer good support for our goals in this implementation.
- I would like to expand
[NonCopyable]to support the concept of "allows copies to read-only locations". This would work well for types that are initially mutable, but later transition to an immutable state. - We could break up the mutable non-copyable types to have an associated type that represents a read-only final state. The latter would not hold resources and would be copyable.
|
@sharwell is the current approach viable? |
|
@sharwell what are next steps here? |
|
@sharwell small reminder on this. thanks! |
sharwell
left a comment
There was a problem hiding this comment.
Approved. Two comments are suggestions for changes either prior to merge or as an immediate follow-up:
ref structback tostruct- See if
SimplePatternMatcher._fullPatternSegmentcan return toreadonly
src/Workspaces/Core/Portable/PatternMatching/AllLowerCamelCaseMatcher.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Core/Portable/PatternMatching/SimplePatternMatcher.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Core/Portable/PatternMatching/AllLowerCamelCaseMatcher.cs
Show resolved
Hide resolved
| { | ||
| private readonly bool _includeMatchedSpans; | ||
| private readonly string _candidate; | ||
| private readonly ArrayBuilder<TextSpan> _candidateHumps; |
There was a problem hiding this comment.
I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>> or Tuple<>/class, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.
This code is a prime candidate for temporary arrays as most often teh number of chunks/matches we have will be <=4, allowing us to process things without any allocs or need to go to pools. This also simplifies our impl a bunch as we don't have to do complex work around avoiding even accessing the Pool for our array builders.