Skip to content

Add 'Eliminate Duplicate SDK Files' one-pager#51886

Merged
MichaelSimons merged 17 commits intodotnet:mainfrom
MichaelSimons:deduplication-one-pager
Jan 30, 2026
Merged

Add 'Eliminate Duplicate SDK Files' one-pager#51886
MichaelSimons merged 17 commits intodotnet:mainfrom
MichaelSimons:deduplication-one-pager

Conversation

@MichaelSimons
Copy link
Member

Related to #41128

Copilot AI review requested due to automatic review settings November 24, 2025 21:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive documentation for the "Eliminate Duplicate SDK Files" initiative, which aims to reduce .NET SDK installation size by removing duplicate assemblies. The proposal addresses a significant bloat issue where duplicates account for 35% of the SDK size on Linux x64 (53 MB compressed, 140 MB on disk), impacting high-volume download scenarios like container images and CI/CD pipelines.

Key changes:

  • Comprehensive analysis of duplicate files across .NET SDK versions with detailed metrics and categorization
  • Proposed technical approach for consolidating assemblies into a shared location and updating component loading mechanisms
  • Performance impact analysis showing 23% faster downloads and extraction times

@MichaelSimons MichaelSimons requested a review from a team November 24, 2025 21:08
MichaelSimons and others added 3 commits November 24, 2025 15:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@MichaelSimons
Copy link
Member Author

cc @jaredpar, @tmat

@tmat
Copy link
Member

tmat commented Dec 1, 2025

Do we need to consider AOT strategy? Which parts of the SDK are we going to AOT? E.g. if we AOT dotnet-watch we can't load Roslyn binaries from the shared location since they would be part of the dotnet-watch native binary.

@jaredpar
Copy link
Member

jaredpar commented Dec 1, 2025

Do we need to consider AOT strategy? Which parts of the SDK are we going to AOT?

@JeremyKuhne and @baronfel are working on the AOT strategy. They're aware of this effort.

Copy link
Member

@JeremyKuhne JeremyKuhne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my perspective.


## Customer Impact: Why SDK Size Matters

While we often envision the .NET SDK as something installed once on a developer's machine, the reality is that most SDK installations occur in ephemeral, high-volume scenarios where the SDK is repeatedly downloaded and extracted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baronfel do we have any rough stats that we can add here to give this statement more weight? For example, the ratio of when the .NET SDK is downloaded in a CI / CD pipeline vs. everything else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can reach out to the Actions folks to get usage rates for the setup-dotnet action? Or the .NET releases team for CDN download rates of the install scripts themselves.

We do have rates of dotnet CLI usage in CI pipelines vs in 'interactive' use that could be used as a proxy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baronfel any updates?

Copy link
Member

@baronfel baronfel Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing from the Actions side - sending mail out to them this morning in relation to the overall GHA/AzDo actions features/perf/usage discussions we'd had in Product Chat recently so I may have better numbers in the future.

For CLI telemetry we're sitting at about a 2:1 ratio of CI:local user usage over the past 30 days, and that holds across longer timelines too.

The overall direction of this effort is to eliminate the vast majority of duplicate assemblies within the .NET SDK so that each shared dependency is carried only once.
There may be a few special cases where different versions, etc. need to be retained.
Achieving this requires solving two distinct but related problems.
First, from a runtime and execution perspective, SDK components must be able to reliably load a single shared copy of each assembly from a common location.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the problem with a shared location is that it makes it easy to take dependencies the component wasn't intending. Essentially once a DLL is loaded from a folder there are a number of ways where the runtime can start resolving other DLLs from the same folder. Think part of the statement here should be that it loads declared dependencies from a single common location.

Another way of stating this: if my component is currently failing to load X.dll, it shouldn't start succeeding to load X.dll because it was put in the common location.

@richlander
Copy link
Member

Do we need to consider AOT strategy? Which parts of the SDK are we going to AOT? E.g. if we AOT dotnet-watch we can't load Roslyn binaries from the shared location since they would be part of the dotnet-watch native binary.

AOT is orthogonal. It has no relationship to this and doesn't need to be considered. This is inherently a CoreCLR targeted project.

@tmat
Copy link
Member

tmat commented Dec 2, 2025

Do we need to consider AOT strategy? Which parts of the SDK are we going to AOT? E.g. if we AOT dotnet-watch we can't load Roslyn binaries from the shared location since they would be part of the dotnet-watch native binary.

AOT is orthogonal. It has no relationship to this and doesn't need to be considered. This is inherently a CoreCLR targeted project.

Why is that? If we AOT dotnet-watch then dotnet-watch can't/won't be loading Roslyn assemblies from the shared location.

@richlander
Copy link
Member

Why is that? If we AOT dotnet-watch then dotnet-watch can't be loading Roslyn assemblies from the shared location.

Let's unpack this. I think you are saying that if we did the AOT project now, it would be status-quo w/rt "duplication". If we do this project, then AOT re-creates duplication. In reality, it still has nothing to do with this project. It makes AOT more (relatively) expensive than it would have been.

I don't see a single design choice or tradeoff here that is influenced by the choice to AOT some fo the SDK, in this release or the future. Can you outline any design considerations?

Another way to look at this is that de-duplication is buying us some budget to spend on AOT. This is in fact how I've always seen this project.

@tmat
Copy link
Member

tmat commented Dec 2, 2025

I see what you mean. What I had on mind is that if we do deduplication now we will need to undo it later for projects that get AOT'd. Let's say there are only two components that share Roslyn: dotnet-watch and dotnet-format. We implement loading of Roslyn assemblies from shared location for both. Then we decide to AOT both. At that point we will remove loading the assemblies and also remove the shared assemblies from the shared location because they won't be loaded by any components anymore. In the limit - if we decide to AOT everything than we don't need sharing at all.

@richlander
Copy link
Member

Ah. That context is useful. I didn't see that one as a design consideration.

That leaves two options:

  • De-dupe now if switching trains is cheap
  • Prove out the model now and wait for some components until we have more clarity.

Another question is why do we have some of these tools in the SDK. IMO, it would be great to remove all tools. My ideal experience is that there are tool packs, exactly like the VS Code extension packs. We could have a gesture to acquire common curated tools as a single gesture. And then much of this discussion goes away. Seems like a missing feature resolves a bunch of problems.

@richlander
Copy link
Member

As a second or third phase, it should seek to remove all uses of Newtonsoft.Json from the SDK. We were talking about this at lunch. Some of the mechanical reasons why we have it were explained to me, but none of them make sense as a long-term plan. My understanding is that NuGet will be the primary concern. This is more than a size issue. For example, once the Memory Safety v2 project is fully-enabled, we should mandate that every single library and tool in the SDK is compiled with the new mode. This will put more pressure on legacy dependencies. Similarly, we should strongly encourage trim safety and discourage reflection. Some of this will likely require breaking changes.

@aortiz-msft @agocke

@MichaelSimons
Copy link
Member Author

Triggered by a comment by @tmat, I re-evaluate using hard links to solve the duplication problem. I had originally dismissed it because of Windows. Discussing this with @baronfel, we feel there is a way that we can support hard links on windows by adding tarball archives. I pushed an update to the document that describes this new approach and decisions that make it feasible. Please review and provide your feedback.


## Customer Impact: Why SDK Size Matters

While we often envision the .NET SDK as something installed once on a developer's machine, the reality is that most SDK installations occur in ephemeral, high-volume scenarios where the SDK is repeatedly downloaded and extracted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baronfel any updates?

@MichaelSimons
Copy link
Member Author

I updated the doc around the issues with RPMs. The document captures what I see as our preferred/intended direction but there are some unknowns that will require us to make changes and then validate before we can say w/certainty that we can use hardlinks w/RPMs.

@MichaelSimons
Copy link
Member Author

@baronfel, @marcpopMSFT, @jaredpar - I would like to merge/close this. I am starting to open PRs for the implementation. I would like your signoff before merging. @baronfel - there is one open question for you. TIA

@MichaelSimons MichaelSimons merged commit 8d32382 into dotnet:main Jan 30, 2026
5 of 7 checks passed
@MichaelSimons MichaelSimons deleted the deduplication-one-pager branch January 30, 2026 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants