error: try to reduce amount of codegen by forcefully unlining error constructors#379
Conversation
…onstructors I had tried this several weeks ago, but recent exploration in #373 prompted me to revisit it. Indeed, `cargo llvm-lines` reveals that Jiff is emitting a _ton_ of code to LLVM just due to its error values. I don't know how much of this is inherent to Jiff's attention to good error messages or whether it's tied to how I've gone about it. With that said... This isn't a pareto distribution. It's not like Jiff's error handling is contributing to 80% of Jiff's LLVM lines or something. It's just the biggest chunk. But that chunk seems to be around 11% or so. This PR tries to reduce that chunk, but... doesn't succeed much. This PR almost certainly has perf regressions (haven't measured yet), but I wanted to put it up anyway to give folks an idea of what I'm trying. Also, here's the first ~30% of where Jiff is emitting LLVM lines (as run from the root of Jiff's repository): ``` Lines Copies Function name ----- ------ ------------- 203564 5977 (TOTAL) 10182 (5.0%, 5.0%) 86 (1.4%, 1.4%) <jiff::error::Error as jiff::error::ErrorContext>::with_context 5688 (2.8%, 7.8%) 204 (3.4%, 4.9%) core::result::Result<T,E>::map_err 3801 (1.9%, 9.7%) 145 (2.4%, 7.3%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch 2837 (1.4%, 11.1%) 116 (1.9%, 9.2%) core::option::Option<T>::ok_or_else 2120 (1.0%, 12.1%) 2 (0.0%, 9.3%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators 1822 (0.9%, 13.0%) 2 (0.0%, 9.3%) jiff::fmt::temporal::printer::SpanPrinter::print_span 1658 (0.8%, 13.8%) 1 (0.0%, 9.3%) jiff::fmt::strtime::parse::Parser::parse 1652 (0.8%, 14.6%) 1 (0.0%, 9.3%) jiff::fmt::strtime::format::Formatter<W,L>::format_one 1548 (0.8%, 15.4%) 6 (0.1%, 9.4%) jiff::fmt::temporal::printer::DateTimePrinter::print_time 1312 (0.6%, 16.0%) 110 (1.8%, 11.3%) <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual 1260 (0.6%, 16.6%) 6 (0.1%, 11.4%) jiff::fmt::temporal::printer::DateTimePrinter::print_date 1219 (0.6%, 17.2%) 47 (0.8%, 12.1%) core::option::Option<T>::map 1160 (0.6%, 17.8%) 22 (0.4%, 12.5%) core::option::Option<T>::map_or 1071 (0.5%, 18.3%) 28 (0.5%, 13.0%) core::result::Result<T,E>::unwrap 997 (0.5%, 18.8%) 21 (0.4%, 13.3%) core::option::Option<T>::or_else 986 (0.5%, 19.3%) 1 (0.0%, 13.4%) jiff::span::Span::from_invariant_nanoseconds 976 (0.5%, 19.8%) 16 (0.3%, 13.6%) <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt 920 (0.5%, 20.2%) 10 (0.2%, 13.8%) <jiff::util::rangeint::ri32<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom 920 (0.5%, 20.7%) 4 (0.1%, 13.9%) jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded 912 (0.4%, 21.1%) 6 (0.1%, 14.0%) jiff::fmt::friendly::printer::DesignatorWriter<W>::write 890 (0.4%, 21.6%) 2 (0.0%, 14.0%) jiff::fmt::temporal::printer::SpanPrinter::print_duration 840 (0.4%, 22.0%) 12 (0.2%, 14.2%) jiff::util::rangeint::ri8<_,_>::get 828 (0.4%, 22.4%) 6 (0.1%, 14.3%) core::slice::<impl [T]>::binary_search_by 819 (0.4%, 22.8%) 32 (0.5%, 14.8%) core::result::Result<T,E>::unwrap_or_else 802 (0.4%, 23.2%) 2 (0.0%, 14.9%) jiff::fmt::friendly::printer::SpanPrinter::print_span_hms 777 (0.4%, 23.6%) 10 (0.2%, 15.0%) jiff::util::rangeint::Composite<T>::map 758 (0.4%, 24.0%) 2 (0.0%, 15.1%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms 744 (0.4%, 24.3%) 8 (0.1%, 15.2%) core::array::try_from_fn_erased 744 (0.4%, 24.7%) 2 (0.0%, 15.2%) jiff::fmt::temporal::printer::DateTimePrinter::print_pieces 720 (0.4%, 25.0%) 2 (0.0%, 15.3%) jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction 700 (0.3%, 25.4%) 10 (0.2%, 15.4%) <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next 700 (0.3%, 25.7%) 7 (0.1%, 15.5%) <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom 682 (0.3%, 26.1%) 6 (0.1%, 15.6%) jiff::fmt::strtime::format::<impl jiff::fmt::strtime::Extension>::write_int 660 (0.3%, 26.4%) 10 (0.2%, 15.8%) <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next 659 (0.3%, 26.7%) 1 (0.0%, 15.8%) jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset 600 (0.3%, 27.0%) 6 (0.1%, 15.9%) <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri32<_,_>>>::rfrom 588 (0.3%, 27.3%) 12 (0.2%, 16.1%) jiff::util::rangeint::ri8<_,_>::try_new 588 (0.3%, 27.6%) 1 (0.0%, 16.1%) jiff::tz::db::zoneinfo::inner::walk 584 (0.3%, 27.9%) 86 (1.4%, 17.6%) <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}} 583 (0.3%, 28.2%) 1 (0.0%, 17.6%) jiff::civil::date::DateDifference::since_with_largest_unit 581 (0.3%, 28.4%) 11 (0.2%, 17.8%) jiff::error::RangeError::new 574 (0.3%, 28.7%) 2 (0.0%, 17.8%) core::slice::sort::stable::quicksort::stable_partition 552 (0.3%, 29.0%) 2 (0.0%, 17.9%) core::str::pattern::TwoWaySearcher::next_back 545 (0.3%, 29.3%) 97 (1.6%, 19.5%) <T as jiff::util::rangeint::RInto<U>>::rinto 544 (0.3%, 29.5%) 2 (0.0%, 19.5%) jiff::fmt::strtime::format::write_offset 536 (0.3%, 29.8%) 4 (0.1%, 19.6%) jiff::fmt::temporal::printer::DateTimePrinter::print_datetime ``` Other than error handling, it doesn't look like there is any one thing that is contributing the most here. That means that I think real improvements here are probably going to require combing through these and ensuring that the number of copies of each function is as small as it can be. In other words, I don't really see any low hanging fruit here, but maybe I'm not looking at this right. FWIW, this PR does decrease the number of LLVM lines, but only marginally: ``` Lines Copies Function name ----- ------ ------------- 195380 5768 (TOTAL) 3801 (1.9%, 1.9%) 145 (2.5%, 2.5%) <core::result::Result<T,E> as core::ops::try_trait::Try>::branch 3286 (1.7%, 3.6%) 86 (1.5%, 4.0%) <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context 2851 (1.5%, 5.1%) 96 (1.7%, 5.7%) core::result::Result<T,E>::map_err 2837 (1.5%, 6.5%) 116 (2.0%, 7.7%) core::option::Option<T>::ok_or_else 2120 (1.1%, 7.6%) 2 (0.0%, 7.7%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators 1822 (0.9%, 8.6%) 2 (0.0%, 7.7%) jiff::fmt::temporal::printer::SpanPrinter::print_span 1658 (0.8%, 9.4%) 1 (0.0%, 7.8%) jiff::fmt::strtime::parse::Parser::parse 1652 (0.8%, 10.3%) 1 (0.0%, 7.8%) jiff::fmt::strtime::format::Formatter<W,L>::format_one 1548 (0.8%, 11.0%) 6 (0.1%, 7.9%) jiff::fmt::temporal::printer::DateTimePrinter::print_time 1312 (0.7%, 11.7%) 110 (1.9%, 9.8%) <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual 1260 (0.6%, 12.4%) 6 (0.1%, 9.9%) jiff::fmt::temporal::printer::DateTimePrinter::print_date 1219 (0.6%, 13.0%) 47 (0.8%, 10.7%) core::option::Option<T>::map 1160 (0.6%, 13.6%) 22 (0.4%, 11.1%) core::option::Option<T>::map_or 1071 (0.5%, 14.1%) 28 (0.5%, 11.6%) core::result::Result<T,E>::unwrap 997 (0.5%, 14.6%) 21 (0.4%, 11.9%) core::option::Option<T>::or_else 986 (0.5%, 15.1%) 1 (0.0%, 12.0%) jiff::span::Span::from_invariant_nanoseconds 976 (0.5%, 15.6%) 16 (0.3%, 12.2%) <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt 925 (0.5%, 16.1%) 27 (0.5%, 12.7%) <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::imp ``` I wonder if my approach to error handling with just using strings everywhere is hurting things. The main alternative would be structured errors everywhere. But there are _so many_ error messages that writing out structured definitions for each is terrifying to me (even if I was willing to use something like `thiserror` to reduce the boiler plate aspect of it). Ref #373
|
Oof. Massive regressions: |
|
This does seem to have a positive impact on compile times. My test setup here was to:
The timings I get are wildly variable. But the minimum time I got with this PR was 5.6s while the minimum time I've seen for Jiff master is 6.3s. This is on an otherwise quiet machine with the CPU governor set to I'm going to see if I can tweak things here to try and gain back some of the runtime perf losses here. Or maybe make them configurable based on the |
I ended up being able to do this. The key was making sure that the |
It turns out that #379 ended up pretty badly regressing datetime parsing performance. In particular, the `with_context` routine on `Result<T, E>` was _eagerly_ calling the caller-provided closure even in the success case. Which means error values (which are generally much more expensive to build than to parse a datetime) were being constructed on every parse. Owch. I'm not sure how I missed this. I ran the benchmarks in #379 and they didn't report anything egregious. I should double-back and improve benchmark coverage. Thankfully, this doesn't seem to regress the number of LLVM lines we're emitting: Lines Copies Function name ----- ------ ------------- 192746 2456 (TOTAL) 3978 (2.1%, 2.1%) 1 (0.0%, 0.0%) jiff::fmt::strtime::format::Formatter<W,L>::format_one 3785 (2.0%, 4.0%) 1 (0.0%, 0.1%) jiff::fmt::strtime::parse::Parser::parse 3600 (1.9%, 5.9%) 2 (0.1%, 0.2%) jiff::fmt::temporal::printer::SpanPrinter::print_span 3340 (1.7%, 7.6%) 2 (0.1%, 0.2%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators 3250 (1.7%, 9.3%) 86 (3.5%, 3.7%) <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}} 2910 (1.5%, 10.8%) 6 (0.2%, 4.0%) jiff::fmt::temporal::printer::DateTimePrinter::print_time 2190 (1.1%, 12.0%) 1 (0.0%, 4.0%) jiff::span::Span::from_invariant_nanoseconds 2166 (1.1%, 13.1%) 6 (0.2%, 4.3%) jiff::fmt::temporal::printer::DateTimePrinter::print_date 1760 (0.9%, 14.0%) 2 (0.1%, 4.4%) jiff::fmt::temporal::printer::SpanPrinter::print_duration 1588 (0.8%, 14.8%) 2 (0.1%, 4.4%) jiff::fmt::friendly::printer::SpanPrinter::print_span_hms 1450 (0.8%, 15.6%) 2 (0.1%, 4.5%) jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction 1432 (0.7%, 16.3%) 4 (0.2%, 4.7%) jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded 1398 (0.7%, 17.0%) 2 (0.1%, 4.8%) jiff::fmt::temporal::printer::DateTimePrinter::print_pieces 1368 (0.7%, 17.8%) 6 (0.2%, 5.0%) jiff::fmt::friendly::printer::DesignatorWriter<W>::write 1364 (0.7%, 18.5%) 2 (0.1%, 5.1%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms 1248 (0.6%, 19.1%) 16 (0.7%, 5.7%) <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt 1243 (0.6%, 19.8%) 1 (0.0%, 5.8%) jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset 1222 (0.6%, 20.4%) 13 (0.5%, 6.3%) jiff::util::rangeint::ri8<_,_>::new 1108 (0.6%, 21.0%) 28 (1.1%, 7.5%) core::result::Result<T,E>::unwrap 1097 (0.6%, 21.5%) 1 (0.0%, 7.5%) jiff::shared::tzif::Header::parse 1086 (0.6%, 22.1%) 22 (0.9%, 8.4%) core::option::Option<T>::map_or Kudos to @charliermarsh for finding this while profiling uv.
It turns out that #379 ended up pretty badly regressing datetime parsing performance. In particular, the `with_context` routine on `Result<T, E>` was _eagerly_ calling the caller-provided closure even in the success case. Which means error values (which are generally much more expensive to build than to parse a datetime) were being constructed on every parse. Owch. I'm not sure how I missed this. I ran the benchmarks in #379 and they didn't report anything egregious. I should double-back and improve benchmark coverage. Thankfully, this doesn't seem to regress the number of LLVM lines we're emitting: Lines Copies Function name ----- ------ ------------- 192746 2456 (TOTAL) 3978 (2.1%, 2.1%) 1 (0.0%, 0.0%) jiff::fmt::strtime::format::Formatter<W,L>::format_one 3785 (2.0%, 4.0%) 1 (0.0%, 0.1%) jiff::fmt::strtime::parse::Parser::parse 3600 (1.9%, 5.9%) 2 (0.1%, 0.2%) jiff::fmt::temporal::printer::SpanPrinter::print_span 3340 (1.7%, 7.6%) 2 (0.1%, 0.2%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators 3250 (1.7%, 9.3%) 86 (3.5%, 3.7%) <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}} 2910 (1.5%, 10.8%) 6 (0.2%, 4.0%) jiff::fmt::temporal::printer::DateTimePrinter::print_time 2190 (1.1%, 12.0%) 1 (0.0%, 4.0%) jiff::span::Span::from_invariant_nanoseconds 2166 (1.1%, 13.1%) 6 (0.2%, 4.3%) jiff::fmt::temporal::printer::DateTimePrinter::print_date 1760 (0.9%, 14.0%) 2 (0.1%, 4.4%) jiff::fmt::temporal::printer::SpanPrinter::print_duration 1588 (0.8%, 14.8%) 2 (0.1%, 4.4%) jiff::fmt::friendly::printer::SpanPrinter::print_span_hms 1450 (0.8%, 15.6%) 2 (0.1%, 4.5%) jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction 1432 (0.7%, 16.3%) 4 (0.2%, 4.7%) jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded 1398 (0.7%, 17.0%) 2 (0.1%, 4.8%) jiff::fmt::temporal::printer::DateTimePrinter::print_pieces 1368 (0.7%, 17.8%) 6 (0.2%, 5.0%) jiff::fmt::friendly::printer::DesignatorWriter<W>::write 1364 (0.7%, 18.5%) 2 (0.1%, 5.1%) jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms 1248 (0.6%, 19.1%) 16 (0.7%, 5.7%) <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt 1243 (0.6%, 19.8%) 1 (0.0%, 5.8%) jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset 1222 (0.6%, 20.4%) 13 (0.5%, 6.3%) jiff::util::rangeint::ri8<_,_>::new 1108 (0.6%, 21.0%) 28 (1.1%, 7.5%) core::result::Result<T,E>::unwrap 1097 (0.6%, 21.5%) 1 (0.0%, 7.5%) jiff::shared::tzif::Header::parse 1086 (0.6%, 22.1%) 22 (0.9%, 8.4%) core::option::Option<T>::map_or Kudos to @charliermarsh for finding this while profiling uv.
I had tried this several weeks ago, but recent exploration in #373
prompted me to revisit it. Indeed,
cargo llvm-linesreveals that Jiffis emitting a fair bit of code to LLVM just due to its error values. I
don't know how much of this is inherent to Jiff's attention to good
error messages or whether it's tied to how I've gone about it.
With that said... This isn't a pareto distribution. It's not like Jiff's
error handling is contributing to 80% of Jiff's LLVM lines or something.
It's just the biggest chunk. But that chunk seems to be around 11% or
so.
This PR tries to reduce that chunk, mostly by being a little more
careful about which error constructors are inlined and which aren't.
Basically, we want the error branch to be inlined into the calling
code (because that's likely a critical path), but we want everything
within the branch to be a call to a function that is ideally not
duplicated too much. I think there's a limit to how well we can do here,
but this PR does seem to improve things without runtime regressions.
Here's the first ~30% of where Jiff is emitting LLVM lines (as run from
the root of Jiff's repository) on current
master:Other than error handling, it doesn't look like there is any one thing
that is contributing the most here. That means that I think real
improvements here are probably going to require combing through these
and ensuring that the number of copies of each function is as small as
it can be.
This PR does decrease the number of LLVM lines, but only marginally:
I wonder if my approach to error handling with just using strings
everywhere is hurting things. The main alternative would be structured
errors everywhere. But there are so many error messages that writing
out structured definitions for each is terrifying to me (even if I was
willing to use something like
thiserrorto reduce the boiler plateaspect of it).
In terms of actual compile times, this does seem to have a positive
impact on compile times. My test setup here was to:
[patch.crates-io]with ajiffentry toBiff's
Cargo.toml.touch path/to/jiff/src/lib.rs && touch src/main.rs && time cargo build --releasefrom the root of Biff's repository.
The timings I get are wildly variable. But the minimum time I got with
this PR was 5.8s while the minimum time I've seen for Jiff master is
6.3s. This is on an otherwise quiet machine with the CPU governor set
to
performance.Finally, running the benchmarks has some noise, but there are no obvious
regressions:
Given the reduction in LLVM times, the seeming reduction in compile
times and the lack of obvious runtime regressions, I think this PR is a
strict improvement.
Ref #373