Skip to content

Always use the ThinLTO pipeline for pre-link optimizations#153446

Merged
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
bjorn3:llvm_pre_link_thinlto
Mar 8, 2026
Merged

Always use the ThinLTO pipeline for pre-link optimizations#153446
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
bjorn3:llvm_pre_link_thinlto

Conversation

@bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Mar 5, 2026

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in 1 there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

Footnotes

  1. https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 5, 2026
@rustbot
Copy link
Collaborator

rustbot commented Mar 5, 2026

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @cuviper

@bjorn3
Copy link
Member Author

bjorn3 commented Mar 5, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Mar 5, 2026
Always use the ThinLTO pipeline for pre-link optimizations
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026
NeedThinLTOBufferPasses = false;
break;
case LLVMRustOptStage::PreLinkFatLTO:
MPM = PB.buildLTOPreLinkDefaultPipeline(OptLevel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this doing the opposite of what your PR description says this is doing?

Copy link
Member Author

@bjorn3 bjorn3 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I had to redo these changes to split it out of my exploration of UnifiedLTO and messed up there.

@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 5, 2026

☀️ Try build successful (CI)
Build commit: 35cd5ab (35cd5ab4525ee29a211b3eed56fab71e82ba846a, parent: 70d86e3abeecf3a655264d9a716c5d08160176b7)

@rust-timer

This comment has been minimized.

When using cargo this was already effectively done for all dependencies
as cargo passes -Clinker-plugin-lto without -Clto=fat/thin.
-Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO
pre-link pipeline is faster than the fat LTO one. And according to the
benchmarks in [1] there is barely any runtime performance difference
between executables that used fat LTO with the fat vs ThinLTO pre-link
pipeline.

[1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
@bjorn3 bjorn3 force-pushed the llvm_pre_link_thinlto branch from 434027f to 71a31b3 Compare March 5, 2026 17:41
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (35cd5ab): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.6% [0.1%, 4.8%] 106
Regressions ❌
(secondary)
0.7% [0.1%, 3.4%] 97
Improvements ✅
(primary)
-0.6% [-3.6%, -0.1%] 15
Improvements ✅
(secondary)
-0.3% [-1.5%, -0.1%] 20
All ❌✅ (primary) 0.4% [-3.6%, 4.8%] 121

Max RSS (memory usage)

Results (primary 0.2%, secondary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.0% [2.6%, 5.4%] 2
Regressions ❌
(secondary)
4.4% [4.4%, 4.4%] 1
Improvements ✅
(primary)
-2.3% [-3.6%, -1.2%] 3
Improvements ✅
(secondary)
-3.5% [-3.5%, -3.5%] 2
All ❌✅ (primary) 0.2% [-3.6%, 5.4%] 5

Cycles

Results (primary 2.1%, secondary 1.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.0% [1.3%, 5.2%] 11
Regressions ❌
(secondary)
2.3% [1.7%, 3.3%] 5
Improvements ✅
(primary)
-2.9% [-3.6%, -2.3%] 2
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 1
All ❌✅ (primary) 2.1% [-3.6%, 5.2%] 13

Binary size

Results (primary -0.5%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.4%] 4
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 1
Improvements ✅
(primary)
-0.8% [-1.4%, -0.4%] 17
Improvements ✅
(secondary)
-1.6% [-2.5%, -1.3%] 4
All ❌✅ (primary) -0.5% [-1.4%, 0.4%] 21

Bootstrap: 480.787s -> 492.506s (2.44%)
Artifact size: 395.02 MiB -> 396.92 MiB (0.48%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 5, 2026
@bjorn3
Copy link
Member Author

bjorn3 commented Mar 5, 2026

The above perf run was accidentally using the fat LTO pre-link pipeline unconditionally.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Mar 5, 2026
Always use the ThinLTO pipeline for pre-link optimizations
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 5, 2026

☀️ Try build successful (CI)
Build commit: 8df093d (8df093d2ceca4c9dd5f81029e73b98d5e26f0c40, parent: 64b72a1fa5449d928d5f553b01a596b78ee255d2)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8df093d): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 1
Improvements ✅
(primary)
-2.3% [-2.4%, -2.2%] 4
Improvements ✅
(secondary)
-1.0% [-1.0%, -1.0%] 1
All ❌✅ (primary) -2.3% [-2.4%, -2.2%] 4

Max RSS (memory usage)

Results (primary -0.1%, secondary 1.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.2% [2.2%, 2.2%] 1
Regressions ❌
(secondary)
5.0% [5.0%, 5.0%] 1
Improvements ✅
(primary)
-2.4% [-2.4%, -2.4%] 1
Improvements ✅
(secondary)
-2.1% [-2.1%, -2.1%] 1
All ❌✅ (primary) -0.1% [-2.4%, 2.2%] 2

Cycles

Results (primary -2.6%, secondary 3.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.7% [3.7%, 3.7%] 1
Improvements ✅
(primary)
-2.6% [-3.0%, -2.0%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.6% [-3.0%, -2.0%] 4

Binary size

Results (primary -0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.3%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-0.4%, -0.3%] 4

Bootstrap: 479.735s -> 483.316s (0.75%)
Artifact size: 395.00 MiB -> 397.05 MiB (0.52%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026
@bjorn3
Copy link
Member Author

bjorn3 commented Mar 5, 2026

I think it is safe to say that always using the fat LTO pre-link pipeline is a terrible idea. Using the thin LTO pre-link pipeline unconditionally may have an ever so slight compile time regression of up to 0.2% aside from a large outlier on large-workspace (but that is a pretty noisy benchmark anyway). And for exa there is a 2.4% improvement. The runtime benchmarks don't show any noticable difference, but I don't know if any of them use fat LTO.

Maybe someone has an application that uses fat LTO where performance is important to test this with. Ditto with a very space constrained application like for embedded devices.

@diondokter
Copy link
Contributor

I've tested this change on 3 real production-level mid-size firmware projects:

image

It seems the change is negligible with regards to size.
So I have no concerns.

@cuviper
Copy link
Member

cuviper commented Mar 7, 2026

Seems reasonable to try, and easy to back out if we find problems.

@bors r+ rollup

@rust-bors
Copy link
Contributor

rust-bors bot commented Mar 7, 2026

📌 Commit 71a31b3 has been approved by cuviper

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 7, 2026
jhpratt added a commit to jhpratt/rust that referenced this pull request Mar 7, 2026
…viper

Always use the ThinLTO pipeline for pre-link optimizations

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

[^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
rust-bors bot pushed a commit that referenced this pull request Mar 8, 2026
Rollup of 4 pull requests

Successful merges:

 - #153202 ([win] Fix truncated unwinds for Arm64 Windows)
 - #153437 (coretest in miri: fix using unstable libtest features)
 - #153446 (Always use the ThinLTO pipeline for pre-link optimizations)
 - #153548 (add test for closure precedence in `TokenStream`s)
@rust-bors rust-bors bot merged commit cc0a60f into rust-lang:main Mar 8, 2026
12 checks passed
@rustbot rustbot added this to the 1.96.0 milestone Mar 8, 2026
rust-timer added a commit that referenced this pull request Mar 8, 2026
Rollup merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=cuviper

Always use the ThinLTO pipeline for pre-link optimizations

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

[^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
@bjorn3 bjorn3 deleted the llvm_pre_link_thinlto branch March 8, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants