Skip to content

Conversation

@kotauskas
Copy link

Inlining the destructor of ErrorData (which currently gets inlined through the destructor of the packed Repr) is in no way helpful in real programs, as the source of the error will not be inlined, so there will not be any match assumptions to gain. The cost, meanwhile, is a code size increase by a factor of up to 5.4 in the case of dropping multiple io::Results in the same function. Accordingly, this disables the inlining to avoid unhelpful code bloat in opt-level = 3 programs.

The destructor of ErrorData on 32-bit platforms might be suffering from the same problem, but fixing that would require some sort of annotation that puts #[inline(never)] on the compiler-generated part of the destructor.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 20, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 20, 2025

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@joboet
Copy link
Member

joboet commented Nov 20, 2025

I don't think preventing inlining of the whole function is a good idea. For cases where the compiler knows the error variant, inlining allows removing the entire destructor. I think it'd be better to prevent inlining the destructor of the custom variant, which is the only one that actually needs destruction.

@kotauskas
Copy link
Author

For cases where the compiler knows the error variant, inlining allows removing the entire destructor.

Those are highly unlikely to occur on hot paths in real programs: the construction point of the error would have to get inlined into the destruction point. It's most certainly not worth inlining the branches (which still worsen the code size, albeit not as much as inlining the destructor) into every io::Error destruction point in hopes that a handful of them will be slightly faster in the particular case of someone having stubbed out a bunch of functions that return io::Result with ones that always return Err(io::Error::from(io::ErrorKind::Unsupported)). Especially the fact that this only happens when stubbing on #[cfg] makes it very unlikely for the destructor elision made possible by inlining the branches to yield meaningful improvements.

@tgross35
Copy link
Contributor

We can at least
@bors2 try @rust-timer queue

Is only removing #[inline] without forbidding it sufficient enough to do anything here?

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Nov 21, 2025
Disable inlining of packed `io::Error` destructor
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 21, 2025
@rust-bors
Copy link
Contributor

rust-bors bot commented Nov 21, 2025

☀️ Try build successful (CI)
Build commit: 07c4fb2 (07c4fb26dcf5a66012bc0079769c55cae7a6a00a, parent: 5f7653df82f7076960d5760830554c98f4cab215)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (07c4fb2): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-1.6%, -0.3%] 8
Improvements ✅
(secondary)
-6.2% [-18.3%, -0.1%] 13
All ❌✅ (primary) -0.7% [-1.6%, -0.3%] 8

Max RSS (memory usage)

Results (primary 1.3%, secondary -2.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.5% [3.5%, 3.5%] 1
Regressions ❌
(secondary)
1.8% [1.6%, 2.0%] 2
Improvements ✅
(primary)
-1.0% [-1.0%, -1.0%] 1
Improvements ✅
(secondary)
-3.4% [-5.0%, -1.4%] 13
All ❌✅ (primary) 1.3% [-1.0%, 3.5%] 2

Cycles

Results (primary 0.2%, secondary -2.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.5% [2.5%, 2.5%] 1
Regressions ❌
(secondary)
4.1% [1.6%, 7.2%] 8
Improvements ✅
(primary)
-2.2% [-2.2%, -2.2%] 1
Improvements ✅
(secondary)
-7.7% [-15.7%, -2.0%] 10
All ❌✅ (primary) 0.2% [-2.2%, 2.5%] 2

Binary size

Results (primary -0.3%, secondary -1.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.7%] 6
Regressions ❌
(secondary)
6.9% [6.9%, 6.9%] 1
Improvements ✅
(primary)
-0.4% [-1.0%, -0.1%] 41
Improvements ✅
(secondary)
-1.2% [-13.2%, -0.0%] 94
All ❌✅ (primary) -0.3% [-1.0%, 0.7%] 47

Bootstrap: 472.873s -> 472.231s (-0.14%)
Artifact size: 388.93 MiB -> 388.44 MiB (-0.12%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 21, 2025
// meanwhile, is a code size increase by a factor of up to 5.4 in the case
// of dropping multiple io::Results in the same function
// (https://godbolt.org/z/8hfGchjsT).
#[inline(never)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pondering: why is this the correct one to inline(never)? What if we made decode_repr be inline(never) and allowed inlining the destructor as that turns into just a call to decode_repr?

Said otherwise, if this drop is a problem, wouldn't data and data_mut and such also not want this inlining?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_repr is a generic function in which C can be one of &Custom, &mut Custom, or Box<Custom>. In the case of the references, which are instantiated by data and data_mut, no types with destructors are involved and there are no destructors to shuffle around. In the Box<Custom> case, which is what's called by the destructor of Repr, decode_repr returns ownership of that which needs its destructor called and never calls any destructors itself, meaning that putting #[inline(never)] on it and removing it from Repr::drop would just make the destructor get inlined again (except the code bloat would be even worse because the branching would get duplicated).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ideal thing would be to ensure ErrorData's drop is outlined but only when it has a nontrivial drop, right? I was thinking this could be done with something like:

trait OutlineDrop { const SHOULD_OUTLINE: bool = false; }
impl<T> OutlineDrop for &T {}
impl<T> OutlineDrop for &mut T {}
impl OutlineDrop for Box<Custom> { const SHOULD_OUTLINE: bool = true; }

impl<C: MaybeOutlineDrop> Drop for ErrorData<C> {
    #[inline(always)]
    fn drop(&mut self) {
        #[inline(never)]
        fn drop_slow<C: MaybeOutlineDrop>(this: &mut ErrorData<C>) {
            // Insert a variant with no drop needed, then drop the custom variant
            // within this outlined function.
            let mut dst = ErrorData::Os(0);
            mem::swap(this, &mut dst);
            drop(dst);
        }

        // Only call the outlined drop
        if C::SHOULD_OUTLINE && matches!(self, ErrorData::Custom(_)) {
            drop_slow(self);
        }
    }
}

But from a quick check it seems like the matches! check is outlined, despite the #[inline(always)], which completely defeats the purpose.

The current optimization is kind of weird for this too. Looking at your godbolt link, it seems like it decides to inline the first drop and outline the second drop? Interesting heuristics.

Copy link
Contributor

@tgross35 tgross35 Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's worth double checking whether there is an option to get this behavior in more places the nontrivial drop may happen, as Scott mentioned. But if there doesn't seem to be a good way, I don't see any reason not to make this change given the perf results.

What is the result without any #[inline]/#[inline(never)] btw? Seems like that could give it some flexibility.

@kotauskas
Copy link
Author

A note on the code size figures: the only benchmarks that regressed are rlibs (probably code that would otherwise be instantiated downstream being moved to the rlibs themselves, in actuality likely reducing the code size after linking), and the executables are wholly in the green.

@tgross35
Copy link
Contributor

tgross35 commented Jan 3, 2026

Would you be able to follow up with the questions at #149146 (comment)?

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 3, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 3, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rustbot
Copy link
Collaborator

rustbot commented Jan 3, 2026

⚠️ Warning ⚠️

@kotauskas
Copy link
Author

@rust-timer queue

@rust-timer

This comment has been minimized.

@kotauskas
Copy link
Author

@tgross35 I don't actually have a proper setup for building and benchmarking standard library changes, so I'll need someone with appropriate permissions to invoke rust-timer to test how it performs with neither #[inline] nor #[inline(never)].

@joboet
Copy link
Member

joboet commented Jan 4, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Jan 4, 2026
Disable inlining of packed `io::Error` destructor
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 4, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 4, 2026

☀️ Try build successful (CI)
Build commit: 2263b42 (2263b424045c4752a7f2c143e498337a87ee306d, parent: f280e764d51976d62da12033ddcbf72d0076a969)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2263b42): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-2.6%, -0.1%] 6
Improvements ✅
(secondary)
-6.2% [-18.4%, -0.2%] 13
All ❌✅ (primary) -0.6% [-2.6%, 0.6%] 7

Max RSS (memory usage)

Results (primary 1.2%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.2% [1.2%, 1.2%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-2.8%, -2.8%] 1
All ❌✅ (primary) 1.2% [1.2%, 1.2%] 1

Cycles

Results (primary -2.3%, secondary -9.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.3% [-2.3%, -2.3%] 1
Improvements ✅
(secondary)
-9.2% [-15.2%, -5.2%] 7
All ❌✅ (primary) -2.3% [-2.3%, -2.3%] 1

Binary size

Results (primary 0.1%, secondary -5.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.6% [0.3%, 2.2%] 7
Regressions ❌
(secondary)
6.9% [6.9%, 6.9%] 1
Improvements ✅
(primary)
-0.3% [-0.6%, -0.1%] 9
Improvements ✅
(secondary)
-7.5% [-12.9%, -2.9%] 7
All ❌✅ (primary) 0.1% [-0.6%, 2.2%] 16

Bootstrap: 474.288s -> 472.095s (-0.46%)
Artifact size: 390.82 MiB -> 390.87 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 4, 2026
@kotauskas
Copy link
Author

Removing #[inline(never)] seems to place it in a neither-here-nor-there state that fails to reduce code size while still improving compile times (presumably because going from #[inline] to no attribute disables per-crate instantiation, which relieves Rustc of the burden of instantiating the destructor everywhere over and over and leaves it up to LTO to inline the destructor). I've returned the PR to the original #[inline(never)] version as such.

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 4, 2026
@tgross35
Copy link
Contributor

tgross35 commented Jan 5, 2026

Would you be able to follow up with the questions at #149146 (comment)?

I was mostly referring to responding to my questions there (your thoughts on what the ideal solution would look like, and what exactly is going on in the godbolt link). If it's a missed optimization in LLVM than an issue should be filed there.

If we can't put an #[inline(never)] at a better place, then I still think it's fine to accept this given the improvements. But there should be a FIXME indicating that it's not the ideal place to outline and linking back to discussion here. Also commits will need to be squashed before merge.

@scottmcm requesting a review to make sure you don't feel differently.

@tgross35 tgross35 requested a review from scottmcm January 5, 2026 00:59
@kotauskas
Copy link
Author

I was mostly referring to responding to my questions there (your thoughts on what the ideal solution would look like, and what exactly is going on in the godbolt link). If it's a missed optimization in LLVM than an issue should be filed there.

I think blaming LLVM for this might be reasonable if a little shortsighted – it loves to inline things with little regard for code size consequences and we can't just file a bug for that behavior.

Short-term, and pertinently to the problem that remains with the 32-bit variant of io::Error, I believe there should be a #[inline_drop] attribute on user-defined types that acts as an #[inline] attribute on the generated destructor for the type. This would allow us to not only apply this same optimization to the 32-bit variant, but also remove inlining from other expensive destructors.

My idea of a more fundamental solution would be a MIR optimization pass that identifies calls to expensive functions (free, syscalls, and so on) and propagates a hint discouraging inlining through every function that calls it unconditionally. If implemented correctly, it would automatically apply the "outline yes-op path, inline the match that decodes the io::Error flavor" optimization proposed early on in this conversation, and maybe also the sort of optimization you see in Vec::push without having to annotate the slow path of the push with #[cold] and #[inline(never)] (although it will still have to be a separate function). The rationale for this being a MIR pass is that it could allow reducing the number of instantiations, improving compile times.

But there should be a FIXME indicating that it's not the ideal place to outline and linking back to discussion here.

Added in 646ad31.

Also commits will need to be squashed before merge.

Actually, the idea behind making a third commit instead of reverting the second one was to preserve in the Git history the fact that the lack of an attribute was actually tried and seen to be worse than #[inline(never)]. If that is excessively verbose, a squash merge would be appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants