Always use the ThinLTO pipeline for pre-link optimizations by bjorn3 · Pull Request #153446 · rust-lang/rust

bjorn3 · 2026-03-05T15:02:09Z

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in ¹ there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774 ↩

rustbot · 2026-03-05T15:02:16Z

r? @cuviper

rustbot has assigned @cuviper.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

Owners of files modified in this PR: @cuviper

bjorn3 · 2026-03-05T15:02:18Z

@bors try @rust-timer queue

Always use the ThinLTO pipeline for pre-link optimizations

nikic · 2026-03-05T16:58:52Z

compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp

-        NeedThinLTOBufferPasses = false;
-        break;
      case LLVMRustOptStage::PreLinkFatLTO:
        MPM = PB.buildLTOPreLinkDefaultPipeline(OptLevel);


Isn't this doing the opposite of what your PR description says this is doing?

Oops. I had to redo these changes to split it out of my exploration of UnifiedLTO and messed up there.

rust-bors · 2026-03-05T17:17:49Z

☀️ Try build successful (CI)
Build commit: 35cd5ab (35cd5ab4525ee29a211b3eed56fab71e82ba846a, parent: 70d86e3abeecf3a655264d9a716c5d08160176b7)

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline. [1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774

rust-timer · 2026-03-05T18:19:39Z

Finished benchmarking commit (35cd5ab): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.1%, 4.8%]	106
Regressions ❌ (secondary)	0.7%	[0.1%, 3.4%]	97
Improvements ✅ (primary)	-0.6%	[-3.6%, -0.1%]	15
Improvements ✅ (secondary)	-0.3%	[-1.5%, -0.1%]	20
All ❌✅ (primary)	0.4%	[-3.6%, 4.8%]	121

Max RSS (memory usage)

Results (primary 0.2%, secondary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	4.0%	[2.6%, 5.4%]	2
Regressions ❌ (secondary)	4.4%	[4.4%, 4.4%]	1
Improvements ✅ (primary)	-2.3%	[-3.6%, -1.2%]	3
Improvements ✅ (secondary)	-3.5%	[-3.5%, -3.5%]	2
All ❌✅ (primary)	0.2%	[-3.6%, 5.4%]	5

Cycles

Results (primary 2.1%, secondary 1.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.0%	[1.3%, 5.2%]	11
Regressions ❌ (secondary)	2.3%	[1.7%, 3.3%]	5
Improvements ✅ (primary)	-2.9%	[-3.6%, -2.3%]	2
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	2.1%	[-3.6%, 5.2%]	13

Binary size

Results (primary -0.5%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.4%, 0.4%]	4
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	1
Improvements ✅ (primary)	-0.8%	[-1.4%, -0.4%]	17
Improvements ✅ (secondary)	-1.6%	[-2.5%, -1.3%]	4
All ❌✅ (primary)	-0.5%	[-1.4%, 0.4%]	21

Bootstrap: 480.787s -> 492.506s (2.44%)
Artifact size: 395.02 MiB -> 396.92 MiB (0.48%)

bjorn3 · 2026-03-05T18:28:32Z

The above perf run was accidentally using the fat LTO pre-link pipeline unconditionally.

@bors try @rust-timer queue

Always use the ThinLTO pipeline for pre-link optimizations

rust-bors · 2026-03-05T20:43:24Z

☀️ Try build successful (CI)
Build commit: 8df093d (8df093d2ceca4c9dd5f81029e73b98d5e26f0c40, parent: 64b72a1fa5449d928d5f553b01a596b78ee255d2)

rust-timer · 2026-03-05T21:23:39Z

Finished benchmarking commit (8df093d): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.1%	[0.1%, 0.1%]	1
Improvements ✅ (primary)	-2.3%	[-2.4%, -2.2%]	4
Improvements ✅ (secondary)	-1.0%	[-1.0%, -1.0%]	1
All ❌✅ (primary)	-2.3%	[-2.4%, -2.2%]	4

Max RSS (memory usage)

Results (primary -0.1%, secondary 1.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.2%	[2.2%, 2.2%]	1
Regressions ❌ (secondary)	5.0%	[5.0%, 5.0%]	1
Improvements ✅ (primary)	-2.4%	[-2.4%, -2.4%]	1
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	-0.1%	[-2.4%, 2.2%]	2

Cycles

Results (primary -2.6%, secondary 3.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.7%	[3.7%, 3.7%]	1
Improvements ✅ (primary)	-2.6%	[-3.0%, -2.0%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.6%	[-3.0%, -2.0%]	4

Binary size

Results (primary -0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.3%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-0.4%, -0.3%]	4

Bootstrap: 479.735s -> 483.316s (0.75%)
Artifact size: 395.00 MiB -> 397.05 MiB (0.52%)

bjorn3 · 2026-03-05T21:44:25Z

I think it is safe to say that always using the fat LTO pre-link pipeline is a terrible idea. Using the thin LTO pre-link pipeline unconditionally may have an ever so slight compile time regression of up to 0.2% aside from a large outlier on large-workspace (but that is a pretty noisy benchmark anyway). And for exa there is a 2.4% improvement. The runtime benchmarks don't show any noticable difference, but I don't know if any of them use fat LTO.

Maybe someone has an application that uses fat LTO where performance is important to test this with. Ditto with a very space constrained application like for embedded devices.

diondokter · 2026-03-06T09:54:12Z

I've tested this change on 3 real production-level mid-size firmware projects:

It seems the change is negligible with regards to size.
So I have no concerns.

cuviper · 2026-03-07T21:32:30Z

Seems reasonable to try, and easy to back out if we find problems.

@bors r+ rollup

rust-bors · 2026-03-07T21:32:32Z

📌 Commit 71a31b3 has been approved by cuviper

It is now in the queue for this repository.

…viper Always use the ThinLTO pipeline for pre-link optimizations When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline. This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it. [^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774

Rollup of 4 pull requests Successful merges: - #153202 ([win] Fix truncated unwinds for Arm64 Windows) - #153437 (coretest in miri: fix using unstable libtest features) - #153446 (Always use the ThinLTO pipeline for pre-link optimizations) - #153548 (add test for closure precedence in `TokenStream`s)

Rollup merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=cuviper Always use the ThinLTO pipeline for pre-link optimizations When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline. This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it. [^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774

rustbot assigned cuviper Mar 5, 2026

This comment has been minimized.

Sign in to view

rust-bors bot pushed a commit that referenced this pull request Mar 5, 2026

Auto merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=<try>

35cd5ab

Always use the ThinLTO pipeline for pre-link optimizations

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026

nikic reviewed Mar 5, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

bjorn3 force-pushed the llvm_pre_link_thinlto branch from 434027f to 71a31b3 Compare March 5, 2026 17:41

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 5, 2026

This comment has been minimized.

Sign in to view

rust-bors bot pushed a commit that referenced this pull request Mar 5, 2026

Auto merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=<try>

8df093d

Always use the ThinLTO pipeline for pre-link optimizations

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2026

rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 7, 2026

jhpratt mentioned this pull request Mar 7, 2026

Rollup of 3 pull requests #153550

Closed

Zalathar mentioned this pull request Mar 8, 2026

Rollup of 4 pull requests #153552

Merged

rust-bors bot merged commit cc0a60f into rust-lang:main Mar 8, 2026
12 checks passed

rustbot added this to the 1.96.0 milestone Mar 8, 2026

bjorn3 deleted the llvm_pre_link_thinlto branch March 8, 2026 07:09

Uh oh!

Conversation

bjorn3 commented Mar 5, 2026

Footnotes

Uh oh!

rustbot commented Mar 5, 2026

Uh oh!

bjorn3 commented Mar 5, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

bjorn3 Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rust-bors bot commented Mar 5, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 5, 2026

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

bjorn3 commented Mar 5, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Mar 5, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Mar 5, 2026

Overall result: ❌✅ regressions and improvements - please read the text below

Uh oh!

bjorn3 commented Mar 5, 2026

Uh oh!

diondokter commented Mar 6, 2026

Uh oh!

cuviper commented Mar 7, 2026

Uh oh!

rust-bors bot commented Mar 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bjorn3 Mar 5, 2026 •

edited

Loading