Skip to content

[Crossgen2] Deduplicate identical IL method bodies in R2R composite image#126047

Closed
kotlarmilos wants to merge 5 commits intodotnet:mainfrom
kotlarmilos:r2r-dedup-il-bodies-measure
Closed

[Crossgen2] Deduplicate identical IL method bodies in R2R composite image#126047
kotlarmilos wants to merge 5 commits intodotnet:mainfrom
kotlarmilos:r2r-dedup-il-bodies-measure

Conversation

@kotlarmilos
Copy link
Copy Markdown
Member

@kotlarmilos kotlarmilos commented Mar 24, 2026

Description

This PR adds content-based deduplication of IL method bodies when emitting R2R composite component assemblies.

CopiedMethodIL on the node factory uses a ConcurrentDictionary<byte[], CopiedMethodILNode> keyed by effective body bytes. When a method's body matches an already-seen body, the existing node is reused, causing multiple MethodDef RVA entries to point to the same IL body blob.

Deduplication operates on post-strip bytes: ReadBodyBytes returns the stripped stub for strippable methods, so all stripped non-generic methods collapse into a single shared node. This builds on top of the --strip-il-bodies option added in #125647.

Impact (MAUI HelloWorld iOS arm64)

Metric Value
Total IL bodies 105,152
Deduplicated 12,874 (12.2%)
Bytes saved ~155 KB

…bodies in R2R composite images

Add a --dedup-il-bodies flag to crossgen2 that enables content-based
deduplication of identical IL method bodies when emitting R2R composite
component assemblies.

When enabled, crossgen2 uses a content-keyed dictionary per component
factory. When a method's IL body bytes match an already-seen body, the
existing CopiedMethodILNode is reused, causing multiple MethodDef RVA
entries to point to the same IL body blob.

Enable by default on Apple mobile platforms (ios, tvos, iossimulator,
tvossimulator, maccatalyst) via MSBuild targets, matching the pattern
used for --strip-inlining-info and --strip-debug-info.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 24, 2026 18:04
@kotlarmilos kotlarmilos changed the title [Crossgen2] Add --dedup-il-bodies to deduplicate identical IL method bodies in R2R composite images [Crossgen2] Add --dedup-il-bodies to deduplicate identical IL method bodies in R2R composite image Mar 24, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in crossgen2 optimization (--dedup-il-bodies) to content-deduplicate identical IL method bodies when producing R2R composite component assemblies, and wires it through the CLI and Apple mobile MSBuild defaults to reduce output size.

Changes:

  • Add --dedup-il-bodies CLI option and plumb it into NodeFactoryOptimizationFlags.
  • Implement IL-body content deduplication in the ReadyToRun node factory using a content-keyed cache.
  • Enable the flag by default for Apple mobile RIDs via Microsoft.NET.CrossGen.targets.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/tasks/Crossgen2Tasks/Microsoft.NET.CrossGen.targets Enables --dedup-il-bodies by default for iOS/tvOS simulators and maccatalyst publish R2R.
src/coreclr/tools/aot/crossgen2/Properties/Resources.resx Adds localized description string for the new CLI option.
src/coreclr/tools/aot/crossgen2/Program.cs Wires the CLI option into NodeFactoryOptimizationFlags.DedupILBodies.
src/coreclr/tools/aot/crossgen2/Crossgen2RootCommand.cs Declares and registers the new --dedup-il-bodies option.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRunCodegenNodeFactory.cs Adds the dedup cache and gates dedup behavior behind DedupILBodies.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/Compiler/DependencyAnalysis/ReadyToRun/CopiedMethodILNode.cs Adds helper to read raw method body bytes for content-based dedup keys.

@kotlarmilos kotlarmilos added this to the 11.0.0 milestone Mar 24, 2026
@jkotas
Copy link
Copy Markdown
Member

jkotas commented Mar 24, 2026

Can this be on by default? I do not think it needs to be an option.

Copy link
Copy Markdown
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead use/enhance ObjectDataInterner from ILC (and enable it in crossgen2) to do the deduplicating for us during emit?

Remove the DedupILBodies flag from NodeFactoryOptimizationFlags, the
--dedup-il-bodies CLI option, the MSBuild PublishReadyToRunDedupILBodies
properties, and the resource string. Deduplication of identical copied
IL method bodies is now unconditional.

Restructure CopiedMethodIL to use a factory function in the dedup
dictionary to avoid creating unused per-method nodes when bodies
deduplicate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kotlarmilos
Copy link
Copy Markdown
Member Author

/azp run runtime-coreclr crossgen2,runtime-coreclr crossgen2-composite

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@kotlarmilos
Copy link
Copy Markdown
Member Author

Could we instead use/enhance ObjectDataInterner from ILC (and enable it in crossgen2) to do the deduplicating for us during emit?

My understanding of the ObjectDataInterner: it is designed for folding compiled native method bodies where it compares code + relocations + other info and runs in iterations because folding can enabling further folds. For the dedup none of that applies.

Enabling it in crossgen2 would require more changes. I think it can be moved, I just want to check if that was the intention.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Mar 25, 2026

ObjectDataInterner

I think it would be overkill for what we need here.

Remove the intermediate NodeCache<MethodDesc, CopiedMethodILNode> and use
only the content-keyed ConcurrentDictionary for IL body dedup, as suggested
in code review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 10:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

@kotlarmilos kotlarmilos marked this pull request as ready for review March 26, 2026 11:31
@MichalStrehovsky
Copy link
Copy Markdown
Member

Looks like this will have an interaction with #125647 that might push this to having to do this later.

@kotlarmilos
Copy link
Copy Markdown
Member Author

Yes, let's first land #125647.

@MichalPetryka
Copy link
Copy Markdown
Contributor

Should this not be done by Roslyn instead? Are there any drawbacks to deduplicating like this?

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Mar 31, 2026

Should this not be done by Roslyn instead? Are there any drawbacks to deduplicating like this?

This is de-duplicating across multiple assemblies in a composite R2R image.

@kotlarmilos kotlarmilos marked this pull request as draft April 7, 2026 13:11
Copilot AI review requested due to automatic review settings April 8, 2026 13:13
ReadBodyBytes now accounts for stripping: when a method would be
stripped, it returns s_minimalILBody instead of original IL bytes.
This makes the dedup key reflect the effective (post-strip) body,
so all stripped methods share a single CopiedMethodILNode.

GetData() reuses ReadBodyBytes to eliminate duplicated stripping logic.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

@kotlarmilos kotlarmilos changed the title [Crossgen2] Add --dedup-il-bodies to deduplicate identical IL method bodies in R2R composite image [Crossgen2] Deduplicate identical IL method bodies in R2R composite image Apr 8, 2026
@kotlarmilos
Copy link
Copy Markdown
Member Author

/azp run runtime-coreclr crossgen2,runtime-coreclr crossgen2-composite

@kotlarmilos kotlarmilos marked this pull request as ready for review April 8, 2026 13:33
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Apr 8, 2026

kotlarmilos requested review from jkoritzinsky and jkotas

Either this PR or #126112 is fine with me. I will let you figure out which one we want to go with.

Comment on lines +1150 to +1151
byte[] bodyBytes = CopiedMethodILNode.ReadBodyBytes(method, OptimizationFlags);
return _copiedMethodIL.GetOrAdd(bodyBytes, _ => new CopiedMethodILNode(method));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not work. If first time someone asks for CopiedMethodIL and optimizationFlags.CompiledMethodDefs is still null, we'll get a node for the real method body bytes. This node will become part of the dependency graph and will get written to the output no matter if there's still a reference to it at the writing phase (if the node is marked, it will get written to the output).

If this is not a concern, we should remove the question mark from optimizationFlags.CompiledMethodDefs?.Contains(method) so that this is a hard crash if anyone wants to use this before we do the analysis.

If it is a concern, there are ways to address this. Probably the easiest would be to delete the CopiedMethodILNode completely and instead emit the bodies as part of CopiedMetadataBlobNode. We don't need a separate symbol for each method body - the builder.EmitReloc(factory.CopiedMethodIL(method), RelocType.IMAGE_REL_BASED_ADDR32NB); can be builder.EmitReloc(this, RelocType.IMAGE_REL_BASED_ADDR32NB, someOffsetFromStart); (where someOffsetFromStart is _sourceModule.PEReader.GetMetadata().Length + cumulativeSizeOfMethodBodiesEmittedSoFar).

@MichalStrehovsky
Copy link
Copy Markdown
Member

kotlarmilos requested review from jkoritzinsky and jkotas

Either this PR or #126112 is fine with me. I will let you figure out which one we want to go with.

I feel similar about this. If we have a use case for deduplication beyond the method bodies, it might be worth it to take Jeremy's change. If we don't see where else we would use this, I'd not add more complexity to the ILC deduplication.

@kotlarmilos
Copy link
Copy Markdown
Member Author

Jeremy's change allows for more flexibility going forward. @jkoritzinsky How do you feel? Do you want to proceed with your PR instead?

@kotlarmilos
Copy link
Copy Markdown
Member Author

Closing this PR in favour of #126112

@kotlarmilos kotlarmilos closed this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants