Skip to content

Avoid OfTypeIterators when enumerating project xml children#8847

Merged
rainersigwald merged 1 commit intodotnet:vs17.7from
dfederm:perf-xml-child-enumeration
Jul 12, 2023
Merged

Avoid OfTypeIterators when enumerating project xml children#8847
rainersigwald merged 1 commit intodotnet:vs17.7from
dfederm:perf-xml-child-enumeration

Conversation

@dfederm
Copy link
Copy Markdown
Contributor

@dfederm dfederm commented Jun 6, 2023

Summary

Performance improvement with noticeable impact when using static graph in large repos.

Customer Impact

~10% reduction in static graph construction time in a specific large repo.

Regression?

No.

Testing

Automated tests, manual runs on reporting repo showing noticeable improvement.

Risk

Low.

Details

Avoid OfTypeIterators when enumerating project xml children

In a recent profile of a graph construction, it was observed that a large amount of boxing was happening for ProjectElementSiblingEnumerable. This change simplifies how xml children are enumerated by adding an internal ChildrenEnumerable property which directly exposes the ProjectElementSiblingEnumerable which should avoid boxing, at least in some code paths (the public API makes it hard to avoid everywhere...).

Additionally, a very common usage of enumerating children was to do Children.OfType<T> and wrap it in a ReadOnlyCollection<T>, so I introduced a GetChildrenOfType (and GetChildrenReversedOfType) method which exposes an ICollection<T> which does the same thing but without the boxing of ProjectElementSiblingEnumerable and without the OfType class. It's just 1 collection allocation.

image

image

image

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong here, but I guess all of the changes that cast GetChildrenOfType to ICollection do nothing here, because you cast you collection to ICollection and so the consumer of this code iterating through ICollection interface won't call a special GetEnumerator method that returns your struct, but rather will call a method on the interface that still would box the iterator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, I think you're right. This would only save on the OfTypeIterators. Maybe still substantial enough to be worth it, but it's less compelling now. Might need to actually compare and gather data.

Copy link
Copy Markdown
Member

@davkean davkean Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For public consumers of these types, you can fix this with a bit of a hack which we use around the tree in situations where we can't change the public API.

Basically if you make the underlying ICollection a known public type, external consumers (like CPS, AnyCode, etc) can cast to and sniff for said type and avoid the boxing by calling the direct enumerator.

Copy link
Copy Markdown
Contributor Author

@dfederm dfederm Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I'm not sure we can make it a known public type very easily though since the existing impl is lazy. It's effectively a linked list "view" on the underlying Xml object model, and while the enumerator may box, it's not knowable whether allocating the full collection would be beneficial or not.

Consider a naive consumer who just enumerates. With these changes, it would allocate the collection (just a wrapper) and box the enumerator. If we returned a well-known collection (let's say List), we'd have to enumerate the full collection and populate the whole collection, which takes more memory due to the underlying storage (array).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only benefits you're getting is here. But I'm curious if you can avoid the collection allocation here as well by having a property 'ChildrenEnumerator' that will return your struct enumerator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed ChildrenInternal -> ChildrenEnumerable for clarity

@dfederm dfederm changed the title Avoid boxing when enumerating project xml children Avoid OfTypeIterators when enumerating project xml children Jun 6, 2023
Copy link
Copy Markdown
Member

@davkean davkean Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, due to the hidden copy that C# performs. :)

Copy link
Copy Markdown
Member

@davkean davkean Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if MSBuild has the ability to ban APIs but I would consider banning usage of Children inside of MSBuild itself to avoid accidentally opt'ing into the boxing again. Can't obsolete because that would affect external consumers.

Copy link
Copy Markdown
Contributor Author

@dfederm dfederm Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting idea. It seems to work except I'm unsure how to handle this for UTs since it's a internal member and not all UTs have visibility. I could InternalsVisibleTo, but not sure if that's desirable here or not (MSBuild team feel free to express an opinion)

Note to self in case it's decided to do this:

P:Microsoft.Build.Construction.ProjectElementContainer.Children;Use ChildrenEnumerable instead to avoid boxing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the ban enforcement in UTs. I'll push a commit with that shortly.

@dfederm
Copy link
Copy Markdown
Contributor Author

dfederm commented Jun 13, 2023

From the thread which initiated this:

Initial:

Static graph loaded in 156.466 seconds: 913 nodes, 12842 edges

And after applying the changes:

I did a quick run with David PR + origin/main and the graph startup improved by 10-15s. GC times is still high.

So still work to do, but this incrementally helps.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can skip the ban enforcement in UTs. I'll push a commit with that shortly.

In a recent profile of a graph construction, it was observed that a
large amount of boxing was happening for
ProjectElementSiblingEnumerable. This change simplifies how xml children
are enumerated by adding an internal ChildrenEnumerable property which
directly exposes the ProjectElementSiblingEnumerable which should avoid
boxing, at least in some code paths (the public API makes it hard to
avoid everywhere...).

Additionally, a very common usage of enumerating children was to do
Children.OfType<T> and wrap it in a ReadOnlyCollection<T>, so I
introduced a GetChildrenOfType (and GetChildrenReversedOfType) method
which exposes an ICollection<T> which does the same thing but without
the boxing of ProjectElementSiblingEnumerable and without the OfType
class. It's just 1 collection allocation.
@rainersigwald rainersigwald force-pushed the perf-xml-child-enumeration branch from 0b79dfe to 4f2a020 Compare July 12, 2023 18:50
@rainersigwald rainersigwald changed the base branch from main to vs17.7 July 12, 2023 18:50
@rainersigwald rainersigwald mentioned this pull request Jul 12, 2023
@rainersigwald rainersigwald merged commit 971bf70 into dotnet:vs17.7 Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants