Disable inlining of packed `io::Error` destructor #149146

kotauskas · 2025-11-20T12:55:38Z

Inlining the destructor of ErrorData (which currently gets inlined through the destructor of the packed Repr) is in no way helpful in real programs, as the source of the error will not be inlined, so there will not be any match assumptions to gain. The cost, meanwhile, is a code size increase by a factor of up to 5.4 in the case of dropping multiple io::Results in the same function. Accordingly, this disables the inlining to avoid unhelpful code bloat in opt-level = 3 programs.

The destructor of ErrorData on 32-bit platforms might be suffering from the same problem, but fixing that would require some sort of annotation that puts #[inline(never)] on the compiler-generated part of the destructor.

rustbot · 2025-11-20T12:55:43Z

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

joboet · 2025-11-20T15:21:49Z

I don't think preventing inlining of the whole function is a good idea. For cases where the compiler knows the error variant, inlining allows removing the entire destructor. I think it'd be better to prevent inlining the destructor of the custom variant, which is the only one that actually needs destruction.

kotauskas · 2025-11-20T19:16:23Z

For cases where the compiler knows the error variant, inlining allows removing the entire destructor.

Those are highly unlikely to occur on hot paths in real programs: the construction point of the error would have to get inlined into the destruction point. It's most certainly not worth inlining the branches (which still worsen the code size, albeit not as much as inlining the destructor) into every io::Error destruction point in hopes that a handful of them will be slightly faster in the particular case of someone having stubbed out a bunch of functions that return io::Result with ones that always return Err(io::Error::from(io::ErrorKind::Unsupported)). Especially the fact that this only happens when stubbing on #[cfg] makes it very unlikely for the destructor elision made possible by inlining the branches to yield meaningful improvements.

tgross35 · 2025-11-21T01:53:38Z

We can at least
@bors2 try @rust-timer queue

Is only removing #[inline] without forbidding it sufficient enough to do anything here?

Disable inlining of packed `io::Error` destructor

rust-bors · 2025-11-21T04:10:31Z

☀️ Try build successful (CI)
Build commit: 07c4fb2 (07c4fb26dcf5a66012bc0079769c55cae7a6a00a, parent: 5f7653df82f7076960d5760830554c98f4cab215)

rust-timer · 2025-11-21T05:30:16Z

Finished benchmarking commit (07c4fb2): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.7%	[-1.6%, -0.3%]	8
Improvements ✅ (secondary)	-6.2%	[-18.3%, -0.1%]	13
All ❌✅ (primary)	-0.7%	[-1.6%, -0.3%]	8

Max RSS (memory usage)

Results (primary 1.3%, secondary -2.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.5%	[3.5%, 3.5%]	1
Regressions ❌ (secondary)	1.8%	[1.6%, 2.0%]	2
Improvements ✅ (primary)	-1.0%	[-1.0%, -1.0%]	1
Improvements ✅ (secondary)	-3.4%	[-5.0%, -1.4%]	13
All ❌✅ (primary)	1.3%	[-1.0%, 3.5%]	2

Cycles

Results (primary 0.2%, secondary -2.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.5%	[2.5%, 2.5%]	1
Regressions ❌ (secondary)	4.1%	[1.6%, 7.2%]	8
Improvements ✅ (primary)	-2.2%	[-2.2%, -2.2%]	1
Improvements ✅ (secondary)	-7.7%	[-15.7%, -2.0%]	10
All ❌✅ (primary)	0.2%	[-2.2%, 2.5%]	2

Binary size

Results (primary -0.3%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.7%]	6
Regressions ❌ (secondary)	6.9%	[6.9%, 6.9%]	1
Improvements ✅ (primary)	-0.4%	[-1.0%, -0.1%]	41
Improvements ✅ (secondary)	-1.2%	[-13.2%, -0.0%]	94
All ❌✅ (primary)	-0.3%	[-1.0%, 0.7%]	47

Bootstrap: 472.873s -> 472.231s (-0.14%)
Artifact size: 388.93 MiB -> 388.44 MiB (-0.12%)

scottmcm · 2025-11-21T07:12:40Z

library/std/src/io/error/repr_bitpacked.rs

+    // meanwhile, is a code size increase by a factor of up to 5.4 in the case
+    // of dropping multiple io::Results in the same function
+    // (https://godbolt.org/z/8hfGchjsT).
+    #[inline(never)]


Pondering: why is this the correct one to inline(never)? What if we made decode_repr be inline(never) and allowed inlining the destructor as that turns into just a call to decode_repr?

Said otherwise, if this drop is a problem, wouldn't data and data_mut and such also not want this inlining?

decode_repr is a generic function in which C can be one of &Custom, &mut Custom, or Box<Custom>. In the case of the references, which are instantiated by data and data_mut, no types with destructors are involved and there are no destructors to shuffle around. In the Box<Custom> case, which is what's called by the destructor of Repr, decode_repr returns ownership of that which needs its destructor called and never calls any destructors itself, meaning that putting #[inline(never)] on it and removing it from Repr::drop would just make the destructor get inlined again (except the code bloat would be even worse because the branching would get duplicated).

The ideal thing would be to ensure ErrorData's drop is outlined but only when it has a nontrivial drop, right? I was thinking this could be done with something like:

trait OutlineDrop { const SHOULD_OUTLINE: bool = false; } impl<T> OutlineDrop for &T {} impl<T> OutlineDrop for &mut T {} impl OutlineDrop for Box<Custom> { const SHOULD_OUTLINE: bool = true; } impl<C: MaybeOutlineDrop> Drop for ErrorData<C> { #[inline(always)] fn drop(&mut self) { #[inline(never)] fn drop_slow<C: MaybeOutlineDrop>(this: &mut ErrorData<C>) { // Insert a variant with no drop needed, then drop the custom variant // within this outlined function. let mut dst = ErrorData::Os(0); mem::swap(this, &mut dst); drop(dst); } // Only call the outlined drop if C::SHOULD_OUTLINE && matches!(self, ErrorData::Custom(_)) { drop_slow(self); } } }

But from a quick check it seems like the matches! check is outlined, despite the #[inline(always)], which completely defeats the purpose.

The current optimization is kind of weird for this too. Looking at your godbolt link, it seems like it decides to inline the first drop and outline the second drop? Interesting heuristics.

I do think it's worth double checking whether there is an option to get this behavior in more places the nontrivial drop may happen, as Scott mentioned. But if there doesn't seem to be a good way, I don't see any reason not to make this change given the perf results.

What is the result without any #[inline]/#[inline(never)] btw? Seems like that could give it some flexibility.

kotauskas · 2025-11-21T08:34:01Z

A note on the code size figures: the only benchmarks that regressed are rlibs (probably code that would otherwise be instantiated downstream being moved to the rlibs themselves, in actuality likely reducing the code size after linking), and the executables are wholly in the green.

tgross35 · 2026-01-03T10:15:32Z

Would you be able to follow up with the questions at #149146 (comment)?

@rustbot author

rustbot · 2026-01-03T10:15:36Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot · 2026-01-03T23:39:27Z

⚠️ Warning ⚠️

This PR is based on an upstream commit that is older than 28 days.

It's recommended to update your branch according to the rustc-dev-guide.

kotauskas · 2026-01-03T23:40:26Z

@rust-timer queue

kotauskas · 2026-01-03T23:43:13Z

@tgross35 I don't actually have a proper setup for building and benchmarking standard library changes, so I'll need someone with appropriate permissions to invoke rust-timer to test how it performs with neither #[inline] nor #[inline(never)].

joboet · 2026-01-04T10:04:23Z

@bors try @rust-timer queue

Disable inlining of packed `io::Error` destructor

rust-bors · 2026-01-04T12:27:56Z

☀️ Try build successful (CI)
Build commit: 2263b42 (2263b424045c4752a7f2c143e498337a87ee306d, parent: f280e764d51976d62da12033ddcbf72d0076a969)

rust-timer · 2026-01-04T13:09:02Z

Finished benchmarking commit (2263b42): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.6%, 0.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-2.6%, -0.1%]	6
Improvements ✅ (secondary)	-6.2%	[-18.4%, -0.2%]	13
All ❌✅ (primary)	-0.6%	[-2.6%, 0.6%]	7

Max RSS (memory usage)

Results (primary 1.2%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	1.2%	[1.2%, 1.2%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-2.8%, -2.8%]	1
All ❌✅ (primary)	1.2%	[1.2%, 1.2%]	1

Cycles

Results (primary -2.3%, secondary -9.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.3%	[-2.3%, -2.3%]	1
Improvements ✅ (secondary)	-9.2%	[-15.2%, -5.2%]	7
All ❌✅ (primary)	-2.3%	[-2.3%, -2.3%]	1

Binary size

Results (primary 0.1%, secondary -5.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.3%, 2.2%]	7
Regressions ❌ (secondary)	6.9%	[6.9%, 6.9%]	1
Improvements ✅ (primary)	-0.3%	[-0.6%, -0.1%]	9
Improvements ✅ (secondary)	-7.5%	[-12.9%, -2.9%]	7
All ❌✅ (primary)	0.1%	[-0.6%, 2.2%]	16

Bootstrap: 474.288s -> 472.095s (-0.46%)
Artifact size: 390.82 MiB -> 390.87 MiB (0.01%)

perf run: https://perf.rust-lang.org/compare.html?start=f280e764d51976d62da12033ddcbf72d0076a969&end=2263b424045c4752a7f2c143e498337a87ee306d&stat=instructions:u Removing `#[inline(never)]` seems to place it in a neither-here-nor-there state that fails to reduce code size.

kotauskas · 2026-01-04T13:31:23Z

Removing #[inline(never)] seems to place it in a neither-here-nor-there state that fails to reduce code size while still improving compile times (presumably because going from #[inline] to no attribute disables per-crate instantiation, which relieves Rustc of the burden of instantiating the destructor everywhere over and over and leaves it up to LTO to inline the destructor). I've returned the PR to the original #[inline(never)] version as such.

@rustbot ready

tgross35 · 2026-01-05T00:59:53Z

Would you be able to follow up with the questions at #149146 (comment)?

I was mostly referring to responding to my questions there (your thoughts on what the ideal solution would look like, and what exactly is going on in the godbolt link). If it's a missed optimization in LLVM than an issue should be filed there.

If we can't put an #[inline(never)] at a better place, then I still think it's fine to accept this given the improvements. But there should be a FIXME indicating that it's not the ideal place to outline and linking back to discussion here. Also commits will need to be squashed before merge.

@scottmcm requesting a review to make sure you don't feel differently.

kotauskas · 2026-01-05T11:29:26Z

I was mostly referring to responding to my questions there (your thoughts on what the ideal solution would look like, and what exactly is going on in the godbolt link). If it's a missed optimization in LLVM than an issue should be filed there.

I think blaming LLVM for this might be reasonable if a little shortsighted – it loves to inline things with little regard for code size consequences and we can't just file a bug for that behavior.

Short-term, and pertinently to the problem that remains with the 32-bit variant of io::Error, I believe there should be a #[inline_drop] attribute on user-defined types that acts as an #[inline] attribute on the generated destructor for the type. This would allow us to not only apply this same optimization to the 32-bit variant, but also remove inlining from other expensive destructors.

My idea of a more fundamental solution would be a MIR optimization pass that identifies calls to expensive functions (free, syscalls, and so on) and propagates a hint discouraging inlining through every function that calls it unconditionally. If implemented correctly, it would automatically apply the "outline yes-op path, inline the match that decodes the io::Error flavor" optimization proposed early on in this conversation, and maybe also the sort of optimization you see in Vec::push without having to annotate the slow path of the push with #[cold] and #[inline(never)] (although it will still have to be a separate function). The rationale for this being a MIR pass is that it could allow reducing the number of instantiations, improving compile times.

But there should be a FIXME indicating that it's not the ideal place to outline and linking back to discussion here.

Added in 646ad31.

Also commits will need to be squashed before merge.

Actually, the idea behind making a third commit instead of reverting the second one was to preserve in the Git history the fact that the lack of an attribute was actually tried and seen to be worse than #[inline(never)]. If that is excessively verbose, a squash merge would be appropriate.

Disable inlining of packed io::Error destructor

cb76e42

rustbot assigned tgross35 Nov 20, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 20, 2025

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Nov 21, 2025

Auto merge of #149146 - kotauskas:patch-1, r=<try>

07c4fb2

Disable inlining of packed `io::Error` destructor

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 21, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 21, 2025

scottmcm reviewed Nov 21, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 3, 2026

Soft-disable io::Error destructor inlining

09a34c8

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Jan 4, 2026

Auto merge of #149146 - kotauskas:patch-1, r=<try>

2263b42

Disable inlining of packed `io::Error` destructor

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 4, 2026

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 4, 2026

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 4, 2026

tgross35 requested a review from scottmcm January 5, 2026 00:59

Add FIXME to io::Error destructor

646ad31

Uh oh!

Disable inlining of packed io::Error destructor #149146

Are you sure you want to change the base?

Disable inlining of packed io::Error destructor #149146

Conversation

kotauskas commented Nov 20, 2025

Uh oh!

rustbot commented Nov 20, 2025

Uh oh!

joboet commented Nov 20, 2025

Uh oh!

kotauskas commented Nov 20, 2025

Uh oh!

tgross35 commented Nov 21, 2025

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Nov 21, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Nov 21, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tgross35 Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotauskas commented Nov 21, 2025

Uh oh!

tgross35 commented Jan 3, 2026

Uh oh!

rustbot commented Jan 3, 2026

Uh oh!

rustbot commented Jan 3, 2026

Uh oh!

kotauskas commented Jan 3, 2026

Uh oh!

This comment has been minimized.

kotauskas commented Jan 3, 2026

Uh oh!

joboet commented Jan 4, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Jan 4, 2026

Uh oh!

This comment has been minimized.

rust-timer commented Jan 4, 2026

Overall result: ✅ improvements - no action needed

Uh oh!

kotauskas commented Jan 4, 2026

Uh oh!

tgross35 commented Jan 5, 2026

Uh oh!

kotauskas commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Disable inlining of packed `io::Error` destructor #149146

Disable inlining of packed `io::Error` destructor #149146

tgross35 Nov 21, 2025 •

edited

Loading