Skip to content

error: try to reduce amount of codegen by forcefully unlining error constructors#379

Merged
BurntSushi merged 1 commit intomasterfrom
ag/tweak-error-constructors
May 18, 2025
Merged

error: try to reduce amount of codegen by forcefully unlining error constructors#379
BurntSushi merged 1 commit intomasterfrom
ag/tweak-error-constructors

Conversation

@BurntSushi
Copy link
Copy Markdown
Owner

@BurntSushi BurntSushi commented May 17, 2025

I had tried this several weeks ago, but recent exploration in #373
prompted me to revisit it. Indeed, cargo llvm-lines reveals that Jiff
is emitting a fair bit of code to LLVM just due to its error values. I
don't know how much of this is inherent to Jiff's attention to good
error messages or whether it's tied to how I've gone about it.

With that said... This isn't a pareto distribution. It's not like Jiff's
error handling is contributing to 80% of Jiff's LLVM lines or something.
It's just the biggest chunk. But that chunk seems to be around 11% or
so.

This PR tries to reduce that chunk, mostly by being a little more
careful about which error constructors are inlined and which aren't.
Basically, we want the error branch to be inlined into the calling
code (because that's likely a critical path), but we want everything
within the branch to be a call to a function that is ideally not
duplicated too much. I think there's a limit to how well we can do here,
but this PR does seem to improve things without runtime regressions.

Here's the first ~30% of where Jiff is emitting LLVM lines (as run from
the root of Jiff's repository) on current master:

  Lines                 Copies              Function name
  -----                 ------              -------------
  203564                5977                (TOTAL)
   10182 (5.0%,  5.0%)    86 (1.4%,  1.4%)  <jiff::error::Error as jiff::error::ErrorContext>::with_context
    5688 (2.8%,  7.8%)   204 (3.4%,  4.9%)  core::result::Result<T,E>::map_err
    3801 (1.9%,  9.7%)   145 (2.4%,  7.3%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    2837 (1.4%, 11.1%)   116 (1.9%,  9.2%)  core::option::Option<T>::ok_or_else
    2120 (1.0%, 12.1%)     2 (0.0%,  9.3%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    1822 (0.9%, 13.0%)     2 (0.0%,  9.3%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    1658 (0.8%, 13.8%)     1 (0.0%,  9.3%)  jiff::fmt::strtime::parse::Parser::parse
    1652 (0.8%, 14.6%)     1 (0.0%,  9.3%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    1548 (0.8%, 15.4%)     6 (0.1%,  9.4%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    1312 (0.6%, 16.0%)   110 (1.8%, 11.3%)  <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
    1260 (0.6%, 16.6%)     6 (0.1%, 11.4%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1219 (0.6%, 17.2%)    47 (0.8%, 12.1%)  core::option::Option<T>::map
    1160 (0.6%, 17.8%)    22 (0.4%, 12.5%)  core::option::Option<T>::map_or
    1071 (0.5%, 18.3%)    28 (0.5%, 13.0%)  core::result::Result<T,E>::unwrap
     997 (0.5%, 18.8%)    21 (0.4%, 13.3%)  core::option::Option<T>::or_else
     986 (0.5%, 19.3%)     1 (0.0%, 13.4%)  jiff::span::Span::from_invariant_nanoseconds
     976 (0.5%, 19.8%)    16 (0.3%, 13.6%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
     920 (0.5%, 20.2%)    10 (0.2%, 13.8%)  <jiff::util::rangeint::ri32<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom
     920 (0.5%, 20.7%)     4 (0.1%, 13.9%)  jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded
     912 (0.4%, 21.1%)     6 (0.1%, 14.0%)  jiff::fmt::friendly::printer::DesignatorWriter<W>::write
     890 (0.4%, 21.6%)     2 (0.0%, 14.0%)  jiff::fmt::temporal::printer::SpanPrinter::print_duration
     840 (0.4%, 22.0%)    12 (0.2%, 14.2%)  jiff::util::rangeint::ri8<_,_>::get
     828 (0.4%, 22.4%)     6 (0.1%, 14.3%)  core::slice::<impl [T]>::binary_search_by
     819 (0.4%, 22.8%)    32 (0.5%, 14.8%)  core::result::Result<T,E>::unwrap_or_else
     802 (0.4%, 23.2%)     2 (0.0%, 14.9%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_hms
     777 (0.4%, 23.6%)    10 (0.2%, 15.0%)  jiff::util::rangeint::Composite<T>::map
     758 (0.4%, 24.0%)     2 (0.0%, 15.1%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms
     744 (0.4%, 24.3%)     8 (0.1%, 15.2%)  core::array::try_from_fn_erased
     744 (0.4%, 24.7%)     2 (0.0%, 15.2%)  jiff::fmt::temporal::printer::DateTimePrinter::print_pieces
     720 (0.4%, 25.0%)     2 (0.0%, 15.3%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction
     700 (0.3%, 25.4%)    10 (0.2%, 15.4%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
     700 (0.3%, 25.7%)     7 (0.1%, 15.5%)  <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom
     682 (0.3%, 26.1%)     6 (0.1%, 15.6%)  jiff::fmt::strtime::format::<impl jiff::fmt::strtime::Extension>::write_int
     660 (0.3%, 26.4%)    10 (0.2%, 15.8%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
     659 (0.3%, 26.7%)     1 (0.0%, 15.8%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset
     600 (0.3%, 27.0%)     6 (0.1%, 15.9%)  <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri32<_,_>>>::rfrom
     588 (0.3%, 27.3%)    12 (0.2%, 16.1%)  jiff::util::rangeint::ri8<_,_>::try_new
     588 (0.3%, 27.6%)     1 (0.0%, 16.1%)  jiff::tz::db::zoneinfo::inner::walk
     584 (0.3%, 27.9%)    86 (1.4%, 17.6%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}}
     583 (0.3%, 28.2%)     1 (0.0%, 17.6%)  jiff::civil::date::DateDifference::since_with_largest_unit
     581 (0.3%, 28.4%)    11 (0.2%, 17.8%)  jiff::error::RangeError::new
     574 (0.3%, 28.7%)     2 (0.0%, 17.8%)  core::slice::sort::stable::quicksort::stable_partition
     552 (0.3%, 29.0%)     2 (0.0%, 17.9%)  core::str::pattern::TwoWaySearcher::next_back
     545 (0.3%, 29.3%)    97 (1.6%, 19.5%)  <T as jiff::util::rangeint::RInto<U>>::rinto
     544 (0.3%, 29.5%)     2 (0.0%, 19.5%)  jiff::fmt::strtime::format::write_offset
     536 (0.3%, 29.8%)     4 (0.1%, 19.6%)  jiff::fmt::temporal::printer::DateTimePrinter::print_datetime

Other than error handling, it doesn't look like there is any one thing
that is contributing the most here. That means that I think real
improvements here are probably going to require combing through these
and ensuring that the number of copies of each function is as small as
it can be.

This PR does decrease the number of LLVM lines, but only marginally:

  Lines                 Copies              Function name
  -----                 ------              -------------
  192748                2457                (TOTAL)
    3978 (2.1%,  2.1%)     1 (0.0%,  0.0%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    3785 (2.0%,  4.0%)     1 (0.0%,  0.1%)  jiff::fmt::strtime::parse::Parser::parse
    3600 (1.9%,  5.9%)     2 (0.1%,  0.2%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    3340 (1.7%,  7.6%)     2 (0.1%,  0.2%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    3250 (1.7%,  9.3%)    86 (3.5%,  3.7%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}}
    2910 (1.5%, 10.8%)     6 (0.2%,  4.0%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    2190 (1.1%, 12.0%)     1 (0.0%,  4.0%)  jiff::span::Span::from_invariant_nanoseconds
    2166 (1.1%, 13.1%)     6 (0.2%,  4.3%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1760 (0.9%, 14.0%)     2 (0.1%,  4.4%)  jiff::fmt::temporal::printer::SpanPrinter::print_duration
    1588 (0.8%, 14.8%)     2 (0.1%,  4.4%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_hms
    1450 (0.8%, 15.6%)     2 (0.1%,  4.5%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction
    1432 (0.7%, 16.3%)     4 (0.2%,  4.7%)  jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded
    1398 (0.7%, 17.0%)     2 (0.1%,  4.8%)  jiff::fmt::temporal::printer::DateTimePrinter::print_pieces
    1368 (0.7%, 17.8%)     6 (0.2%,  5.0%)  jiff::fmt::friendly::printer::DesignatorWriter<W>::write
    1364 (0.7%, 18.5%)     2 (0.1%,  5.1%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms
    1248 (0.6%, 19.1%)    16 (0.7%,  5.7%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
    1243 (0.6%, 19.8%)     1 (0.0%,  5.8%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset
    1222 (0.6%, 20.4%)    13 (0.5%,  6.3%)  jiff::util::rangeint::ri8<_,_>::new
    1108 (0.6%, 21.0%)    28 (1.1%,  7.4%)  core::result::Result<T,E>::unwrap
    1097 (0.6%, 21.5%)     1 (0.0%,  7.5%)  jiff::shared::tzif::Header::parse
    1086 (0.6%, 22.1%)    22 (0.9%,  8.4%)  core::option::Option<T>::map_or
     975 (0.5%, 22.6%)    21 (0.9%,  9.2%)  core::option::Option<T>::or_else
     970 (0.5%, 23.1%)     1 (0.0%,  9.3%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_always_utc
     923 (0.5%, 23.6%)     1 (0.0%,  9.3%)  jiff::fmt::offset::Parser::parse_numeric
     919 (0.5%, 24.1%)     1 (0.0%,  9.4%)  jiff::zoned::ZonedDifference::until_with_largest_unit
     884 (0.5%, 24.5%)     2 (0.1%,  9.4%)  jiff::tz::concatenated::Header::read
     842 (0.4%, 25.0%)     2 (0.1%,  9.5%)  jiff::fmt::strtime::format::write_offset
     822 (0.4%, 25.4%)     6 (0.2%,  9.8%)  core::slice::<impl [T]>::binary_search_by
     816 (0.4%, 25.8%)     6 (0.2%, 10.0%)  jiff::fmt::strtime::format::<impl jiff::fmt::strtime::Extension>::write_int
     796 (0.4%, 26.2%)     1 (0.0%, 10.1%)  jiff::shared::tzif::<impl jiff::shared::Tzif<alloc::string::String,jiff::shared::util::array_str::ArrayStr<30_usize>,alloc::vec::Vec<jiff::shared::TzifLocalTimeType>,alloc::vec::Vec<i64>,alloc::vec::Vec<jiff::shared::TzifDateTime>,alloc::vec::Vec<jiff::shared::TzifDateTime>,alloc::vec::Vec<jiff::shared::TzifTransitionInfo>>>::parse64
     792 (0.4%, 26.6%)    22 (0.9%, 10.9%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::context::{{closure}}
     784 (0.4%, 27.0%)     4 (0.2%, 11.1%)  jiff::fmt::temporal::printer::DateTimePrinter::print_datetime
     773 (0.4%, 27.4%)     2 (0.1%, 11.2%)  jiff::tz::offset::OffsetConflict::resolve_via_reject
     758 (0.4%, 27.8%)     2 (0.1%, 11.3%)  jiff::shared::posix::<impl jiff::shared::PosixTimeZone<ABBREV>>::to_ambiguous_kind
     736 (0.4%, 28.2%)     4 (0.2%, 11.4%)  jiff::fmt::friendly::printer::FractionalPrinter::print
     687 (0.4%, 28.6%)     1 (0.0%, 11.5%)  jiff::tz::db::zoneinfo::inner::walk
     680 (0.4%, 28.9%)     1 (0.0%, 11.5%)  jiff::shared::tzif::<impl jiff::shared::Tzif<alloc::string::String,jiff::shared::util::array_str::ArrayStr<30_usize>,alloc::vec::Vec<jiff::shared::TzifLocalTimeType>,alloc::vec::Vec<i64>,alloc::vec::Vec<jiff::shared::TzifDateTime>,alloc::vec::Vec<jiff::shared::TzifDateTime>,alloc::vec::Vec<jiff::shared::TzifTransitionInfo>>>::parse_time_zone_designations
     654 (0.3%, 29.3%)     2 (0.1%, 11.6%)  jiff::shared::posix::<impl jiff::shared::PosixTimeZone<ABBREV>>::previous_transition
     652 (0.3%, 29.6%)     2 (0.1%, 11.7%)  jiff::shared::posix::<impl jiff::shared::PosixTimeZone<ABBREV>>::next_transition
     650 (0.3%, 29.9%)     1 (0.0%, 11.7%)  jiff::tz::system::get_env_tz

I wonder if my approach to error handling with just using strings
everywhere is hurting things. The main alternative would be structured
errors everywhere. But there are so many error messages that writing
out structured definitions for each is terrifying to me (even if I was
willing to use something like thiserror to reduce the boiler plate
aspect of it).

In terms of actual compile times, this does seem to have a positive
impact on compile times. My test setup here was to:

  1. Add [patch.crates-io] with a jiff entry to
    Biff's Cargo.toml.
  2. Run
    touch path/to/jiff/src/lib.rs && touch src/main.rs && time cargo build --release
    from the root of Biff's repository.

The timings I get are wildly variable. But the minimum time I got with
this PR was 5.8s while the minimum time I've seen for Jiff master is
6.3s. This is on an otherwise quiet machine with the CPU governor set
to performance.

Finally, running the benchmarks has some noise, but there are no obvious
regressions:

$ critcmp base zchange3 -f '/jiff$' -t 5
group                                                    base                                   zchange3
-----                                                    ----                                   --------
civil_datetime/to_timestamp_tzdb_lookup/zoneinfo/jiff    1.00     43.5±0.27ns        ? ?/sec    1.24     54.1±0.23ns        ? ?/sec
date/add_days/one/duration/jiff                          1.00      5.2±0.05ns        ? ?/sec    1.08      5.6±0.04ns        ? ?/sec
date/add_days/one/span/jiff                              1.00      9.5±0.09ns        ? ?/sec    1.06     10.0±0.08ns        ? ?/sec
date/add_years_months_days/jiff                          1.08     24.7±0.20ns        ? ?/sec    1.00     22.8±0.16ns        ? ?/sec
parse/friendly/long/span/jiff                            1.06    122.8±0.18ns        ? ?/sec    1.00    115.9±0.27ns        ? ?/sec
parse/friendly/longer/span/jiff                          1.05    124.2±0.29ns        ? ?/sec    1.00    118.0±0.79ns        ? ?/sec
parse/friendly/medium/span/jiff                          1.05     67.8±0.22ns        ? ?/sec    1.00     64.5±0.26ns        ? ?/sec
parse/friendly/short/duration/jiff                       1.17     16.6±0.28ns        ? ?/sec    1.00     14.2±0.34ns        ? ?/sec
parse/rfc2822/jiff                                       1.25     29.4±0.31ns        ? ?/sec    1.00     23.5±0.26ns        ? ?/sec
parse/strptime/oneshot/jiff                              1.11     74.3±0.63ns        ? ?/sec    1.00     67.0±0.48ns        ? ?/sec
timestamp/every_hour_in_week/series/jiff                 1.33    143.8±0.70ns        ? ?/sec    1.00    107.7±0.61ns        ? ?/sec
tz/posix_datetime_to_offset/jiff                         1.00     31.0±0.24ns        ? ?/sec    1.05     32.5±0.20ns        ? ?/sec
zoned/fixed_offset_to_timestamp/jiff                     1.25      0.4±0.00ns        ? ?/sec    1.00      0.3±0.00ns        ? ?/sec

Given the reduction in LLVM times, the seeming reduction in compile
times and the lack of obvious runtime regressions, I think this PR is a
strict improvement.

Ref #373

…onstructors

I had tried this several weeks ago, but recent exploration in #373
prompted me to revisit it. Indeed, `cargo llvm-lines` reveals that Jiff
is emitting a _ton_ of code to LLVM just due to its error values. I
don't know how much of this is inherent to Jiff's attention to good
error messages or whether it's tied to how I've gone about it.

With that said... This isn't a pareto distribution. It's not like Jiff's
error handling is contributing to 80% of Jiff's LLVM lines or something.
It's just the biggest chunk. But that chunk seems to be around 11% or
so.

This PR tries to reduce that chunk, but... doesn't succeed much. This PR
almost certainly has perf regressions (haven't measured yet), but I
wanted to put it up anyway to give folks an idea of what I'm trying.

Also, here's the first ~30% of where Jiff is emitting LLVM lines (as run
from the root of Jiff's repository):

```
  Lines                 Copies              Function name
  -----                 ------              -------------
  203564                5977                (TOTAL)
   10182 (5.0%,  5.0%)    86 (1.4%,  1.4%)  <jiff::error::Error as jiff::error::ErrorContext>::with_context
    5688 (2.8%,  7.8%)   204 (3.4%,  4.9%)  core::result::Result<T,E>::map_err
    3801 (1.9%,  9.7%)   145 (2.4%,  7.3%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    2837 (1.4%, 11.1%)   116 (1.9%,  9.2%)  core::option::Option<T>::ok_or_else
    2120 (1.0%, 12.1%)     2 (0.0%,  9.3%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    1822 (0.9%, 13.0%)     2 (0.0%,  9.3%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    1658 (0.8%, 13.8%)     1 (0.0%,  9.3%)  jiff::fmt::strtime::parse::Parser::parse
    1652 (0.8%, 14.6%)     1 (0.0%,  9.3%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    1548 (0.8%, 15.4%)     6 (0.1%,  9.4%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    1312 (0.6%, 16.0%)   110 (1.8%, 11.3%)  <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
    1260 (0.6%, 16.6%)     6 (0.1%, 11.4%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1219 (0.6%, 17.2%)    47 (0.8%, 12.1%)  core::option::Option<T>::map
    1160 (0.6%, 17.8%)    22 (0.4%, 12.5%)  core::option::Option<T>::map_or
    1071 (0.5%, 18.3%)    28 (0.5%, 13.0%)  core::result::Result<T,E>::unwrap
     997 (0.5%, 18.8%)    21 (0.4%, 13.3%)  core::option::Option<T>::or_else
     986 (0.5%, 19.3%)     1 (0.0%, 13.4%)  jiff::span::Span::from_invariant_nanoseconds
     976 (0.5%, 19.8%)    16 (0.3%, 13.6%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
     920 (0.5%, 20.2%)    10 (0.2%, 13.8%)  <jiff::util::rangeint::ri32<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom
     920 (0.5%, 20.7%)     4 (0.1%, 13.9%)  jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded
     912 (0.4%, 21.1%)     6 (0.1%, 14.0%)  jiff::fmt::friendly::printer::DesignatorWriter<W>::write
     890 (0.4%, 21.6%)     2 (0.0%, 14.0%)  jiff::fmt::temporal::printer::SpanPrinter::print_duration
     840 (0.4%, 22.0%)    12 (0.2%, 14.2%)  jiff::util::rangeint::ri8<_,_>::get
     828 (0.4%, 22.4%)     6 (0.1%, 14.3%)  core::slice::<impl [T]>::binary_search_by
     819 (0.4%, 22.8%)    32 (0.5%, 14.8%)  core::result::Result<T,E>::unwrap_or_else
     802 (0.4%, 23.2%)     2 (0.0%, 14.9%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_hms
     777 (0.4%, 23.6%)    10 (0.2%, 15.0%)  jiff::util::rangeint::Composite<T>::map
     758 (0.4%, 24.0%)     2 (0.0%, 15.1%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms
     744 (0.4%, 24.3%)     8 (0.1%, 15.2%)  core::array::try_from_fn_erased
     744 (0.4%, 24.7%)     2 (0.0%, 15.2%)  jiff::fmt::temporal::printer::DateTimePrinter::print_pieces
     720 (0.4%, 25.0%)     2 (0.0%, 15.3%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction
     700 (0.3%, 25.4%)    10 (0.2%, 15.4%)  <core::iter::adapters::zip::Zip<A,B> as core::iter::adapters::zip::ZipImpl<A,B>>::next
     700 (0.3%, 25.7%)     7 (0.1%, 15.5%)  <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri64<_,_>>>::rfrom
     682 (0.3%, 26.1%)     6 (0.1%, 15.6%)  jiff::fmt::strtime::format::<impl jiff::fmt::strtime::Extension>::write_int
     660 (0.3%, 26.4%)    10 (0.2%, 15.8%)  <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
     659 (0.3%, 26.7%)     1 (0.0%, 15.8%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset
     600 (0.3%, 27.0%)     6 (0.1%, 15.9%)  <jiff::util::rangeint::ri8<_,_> as jiff::util::rangeint::RFrom<jiff::util::rangeint::ri32<_,_>>>::rfrom
     588 (0.3%, 27.3%)    12 (0.2%, 16.1%)  jiff::util::rangeint::ri8<_,_>::try_new
     588 (0.3%, 27.6%)     1 (0.0%, 16.1%)  jiff::tz::db::zoneinfo::inner::walk
     584 (0.3%, 27.9%)    86 (1.4%, 17.6%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}}
     583 (0.3%, 28.2%)     1 (0.0%, 17.6%)  jiff::civil::date::DateDifference::since_with_largest_unit
     581 (0.3%, 28.4%)    11 (0.2%, 17.8%)  jiff::error::RangeError::new
     574 (0.3%, 28.7%)     2 (0.0%, 17.8%)  core::slice::sort::stable::quicksort::stable_partition
     552 (0.3%, 29.0%)     2 (0.0%, 17.9%)  core::str::pattern::TwoWaySearcher::next_back
     545 (0.3%, 29.3%)    97 (1.6%, 19.5%)  <T as jiff::util::rangeint::RInto<U>>::rinto
     544 (0.3%, 29.5%)     2 (0.0%, 19.5%)  jiff::fmt::strtime::format::write_offset
     536 (0.3%, 29.8%)     4 (0.1%, 19.6%)  jiff::fmt::temporal::printer::DateTimePrinter::print_datetime
```

Other than error handling, it doesn't look like there is any one thing
that is contributing the most here. That means that I think real
improvements here are probably going to require combing through these
and ensuring that the number of copies of each function is as small as
it can be.

In other words, I don't really see any low hanging fruit here, but maybe
I'm not looking at this right.

FWIW, this PR does decrease the number of LLVM lines, but only
marginally:

```
  Lines                 Copies              Function name
  -----                 ------              -------------
  195380                5768                (TOTAL)
    3801 (1.9%,  1.9%)   145 (2.5%,  2.5%)  <core::result::Result<T,E> as core::ops::try_trait::Try>::branch
    3286 (1.7%,  3.6%)    86 (1.5%,  4.0%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context
    2851 (1.5%,  5.1%)    96 (1.7%,  5.7%)  core::result::Result<T,E>::map_err
    2837 (1.5%,  6.5%)   116 (2.0%,  7.7%)  core::option::Option<T>::ok_or_else
    2120 (1.1%,  7.6%)     2 (0.0%,  7.7%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    1822 (0.9%,  8.6%)     2 (0.0%,  7.7%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    1658 (0.8%,  9.4%)     1 (0.0%,  7.8%)  jiff::fmt::strtime::parse::Parser::parse
    1652 (0.8%, 10.3%)     1 (0.0%,  7.8%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    1548 (0.8%, 11.0%)     6 (0.1%,  7.9%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    1312 (0.7%, 11.7%)   110 (1.9%,  9.8%)  <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
    1260 (0.6%, 12.4%)     6 (0.1%,  9.9%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1219 (0.6%, 13.0%)    47 (0.8%, 10.7%)  core::option::Option<T>::map
    1160 (0.6%, 13.6%)    22 (0.4%, 11.1%)  core::option::Option<T>::map_or
    1071 (0.5%, 14.1%)    28 (0.5%, 11.6%)  core::result::Result<T,E>::unwrap
     997 (0.5%, 14.6%)    21 (0.4%, 11.9%)  core::option::Option<T>::or_else
     986 (0.5%, 15.1%)     1 (0.0%, 12.0%)  jiff::span::Span::from_invariant_nanoseconds
     976 (0.5%, 15.6%)    16 (0.3%, 12.2%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
     925 (0.5%, 16.1%)    27 (0.5%, 12.7%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::imp
```

I wonder if my approach to error handling with just using strings
everywhere is hurting things. The main alternative would be structured
errors everywhere. But there are _so many_ error messages that writing
out structured definitions for each is terrifying to me (even if I was
willing to use something like `thiserror` to reduce the boiler plate
aspect of it).

Ref #373
@BurntSushi
Copy link
Copy Markdown
Owner Author

BurntSushi commented May 17, 2025

Oof. Massive regressions:

$ critcmp base zchange1 -f '/jiff$' -t 50
group                                                    base                                   zchange1
-----                                                    ----                                   --------
civil_datetime/add_days/diffyear/duration/jiff           1.00     13.6±0.13ns        ? ?/sec    19.25   261.6±3.04ns        ? ?/sec
civil_datetime/add_days/diffyear/span/jiff               1.00     25.5±0.13ns        ? ?/sec    3.81     97.0±0.57ns        ? ?/sec
civil_datetime/add_days/sameyear/duration/jiff           1.00     13.6±0.11ns        ? ?/sec    18.85   256.1±2.76ns        ? ?/sec
civil_datetime/add_days/sameyear/span/jiff               1.00     25.5±0.20ns        ? ?/sec    3.81     97.0±0.67ns        ? ?/sec
civil_datetime/to_timestamp_static/bundled/jiff          1.00     10.5±0.10ns        ? ?/sec    13.57   143.1±1.75ns        ? ?/sec
civil_datetime/to_timestamp_static/zoneinfo/jiff         1.00     10.5±0.08ns        ? ?/sec    13.46   141.5±1.70ns        ? ?/sec
civil_datetime/to_timestamp_tzdb_lookup/zoneinfo/jiff    1.00     43.5±0.27ns        ? ?/sec    3.96    172.0±1.60ns        ? ?/sec
date/add_days/diffyear/duration/jiff                     1.00      6.5±0.08ns        ? ?/sec    14.36    93.9±1.92ns        ? ?/sec
date/add_days/sameyear/duration/jiff                     1.00      6.5±0.06ns        ? ?/sec    13.75    89.8±1.11ns        ? ?/sec
datetime/add_years_months_days/jiff                      1.00     34.6±0.41ns        ? ?/sec    3.54    122.4±1.21ns        ? ?/sec
parse/civil_datetime/jiff                                1.00     25.9±0.06ns        ? ?/sec    69.83 1807.8±22.11ns        ? ?/sec
parse/friendly/long/span/jiff                            1.00    122.8±0.18ns        ? ?/sec    6.41    786.9±9.44ns        ? ?/sec
parse/friendly/longer/span/jiff                          1.00    124.2±0.29ns        ? ?/sec    11.19  1390.2±4.17ns        ? ?/sec
parse/friendly/longest-time/duration/jiff                1.00     44.5±0.83ns        ? ?/sec    21.91   974.2±8.31ns        ? ?/sec
parse/friendly/longest-time/span/jiff                    1.00    138.3±0.44ns        ? ?/sec    10.68  1477.2±6.06ns        ? ?/sec
parse/friendly/longest/span/jiff                         1.00    188.7±0.74ns        ? ?/sec    10.89     2.1±0.01µs        ? ?/sec
parse/friendly/medium/span/jiff                          1.00     67.8±0.22ns        ? ?/sec    7.09    480.7±5.65ns        ? ?/sec
parse/friendly/short/duration/jiff                       1.00     16.6±0.28ns        ? ?/sec    12.77   212.2±3.53ns        ? ?/sec
parse/friendly/short/span/jiff                           1.00     52.8±0.17ns        ? ?/sec    6.91    364.8±3.32ns        ? ?/sec
parse/friendly/tiny/duration/jiff                        1.00      7.8±0.19ns        ? ?/sec    20.04   157.1±1.81ns        ? ?/sec
parse/friendly/tiny/span/jiff                            1.00     34.7±0.06ns        ? ?/sec    6.84    237.6±2.69ns        ? ?/sec
parse/rfc2822/jiff                                       1.00     29.4±0.31ns        ? ?/sec    35.28 1037.4±11.02ns        ? ?/sec
parse/strptime/oneshot/jiff                              1.00     74.3±0.63ns        ? ?/sec    24.45 1817.4±16.78ns        ? ?/sec
print/strftime/oneshot/jiff                              1.00    109.3±0.84ns        ? ?/sec    3.26    356.9±7.11ns        ? ?/sec
timestamp/add_time_secs/span/jiff                        1.00     10.8±0.11ns        ? ?/sec    15.17   164.4±2.22ns        ? ?/sec
timestamp/add_time_subsec/span/jiff                      1.00     18.3±0.25ns        ? ?/sec    9.95    181.8±5.05ns        ? ?/sec
timestamp/every_hour_in_week/series/jiff                 1.00    143.8±0.70ns        ? ?/sec    1.97    282.6±0.96ns        ? ?/sec
zoned/fixed_offset_add_time/span/jiff                    1.00     36.3±0.12ns        ? ?/sec    11.98   435.1±4.78ns        ? ?/sec

@BurntSushi
Copy link
Copy Markdown
Owner Author

BurntSushi commented May 17, 2025

This does seem to have a positive impact on compile times. My test setup here was to:

  1. Add [patch.crates-io] with a jiff entry to Biff's Cargo.toml.
  2. Run touch path/to/jiff/src/lib.rs && touch src/main.rs && time cargo build --release from the root of Biff's repository.

The timings I get are wildly variable. But the minimum time I got with this PR was 5.6s while the minimum time I've seen for Jiff master is 6.3s. This is on an otherwise quiet machine with the CPU governor set to performance.

I'm going to see if I can tweak things here to try and gain back some of the runtime perf losses here. Or maybe make them configurable based on the perf-inline feature. I guess there's somewhat of an inherent tension here. It should be possible to do better than master. Basically, we want the error case analysis to be inlined into the hot path, but then we want everything else to be off in a function call somewhere that does get duplicated and inlined everywhere it's used.

@BurntSushi
Copy link
Copy Markdown
Owner Author

I'm going to see if I can tweak things here to try and gain back some of the runtime perf losses here. Or maybe make them configurable based on the perf-inline feature. I guess there's somewhat of an inherent tension here. It should be possible to do better than master. Basically, we want the error case analysis to be inlined into the hot path, but then we want everything else to be off in a function call somewhere that does get duplicated and inlined everywhere it's used.

I ended up being able to do this. The key was making sure that the context() and with_context() extension methods on Result<T, E> get inlined, but where pretty much everything else is not inlined. This slightly increases LLVM lines (but still an overall net decrease from master). Compilation times still seem to be improved to roughly the same degree. And, importantly, there are no runtime regressions detectable outside of noise.

@BurntSushi BurntSushi merged commit ca06977 into master May 18, 2025
38 checks passed
@BurntSushi BurntSushi deleted the ag/tweak-error-constructors branch May 18, 2025 12:59
BurntSushi added a commit that referenced this pull request Jun 13, 2025
It turns out that #379 ended up pretty badly regressing datetime parsing
performance. In particular, the `with_context` routine on `Result<T, E>`
was _eagerly_ calling the caller-provided closure even in the success
case. Which means error values (which are generally much more expensive
to build than to parse a datetime) were being constructed on every
parse. Owch.

I'm not sure how I missed this. I ran the benchmarks in #379 and they
didn't report anything egregious. I should double-back and improve
benchmark coverage.

Thankfully, this doesn't seem to regress the number of LLVM lines we're
emitting:

  Lines                 Copies              Function name
  -----                 ------              -------------
  192746                2456                (TOTAL)
    3978 (2.1%,  2.1%)     1 (0.0%,  0.0%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    3785 (2.0%,  4.0%)     1 (0.0%,  0.1%)  jiff::fmt::strtime::parse::Parser::parse
    3600 (1.9%,  5.9%)     2 (0.1%,  0.2%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    3340 (1.7%,  7.6%)     2 (0.1%,  0.2%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    3250 (1.7%,  9.3%)    86 (3.5%,  3.7%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}}
    2910 (1.5%, 10.8%)     6 (0.2%,  4.0%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    2190 (1.1%, 12.0%)     1 (0.0%,  4.0%)  jiff::span::Span::from_invariant_nanoseconds
    2166 (1.1%, 13.1%)     6 (0.2%,  4.3%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1760 (0.9%, 14.0%)     2 (0.1%,  4.4%)  jiff::fmt::temporal::printer::SpanPrinter::print_duration
    1588 (0.8%, 14.8%)     2 (0.1%,  4.4%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_hms
    1450 (0.8%, 15.6%)     2 (0.1%,  4.5%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction
    1432 (0.7%, 16.3%)     4 (0.2%,  4.7%)  jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded
    1398 (0.7%, 17.0%)     2 (0.1%,  4.8%)  jiff::fmt::temporal::printer::DateTimePrinter::print_pieces
    1368 (0.7%, 17.8%)     6 (0.2%,  5.0%)  jiff::fmt::friendly::printer::DesignatorWriter<W>::write
    1364 (0.7%, 18.5%)     2 (0.1%,  5.1%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms
    1248 (0.6%, 19.1%)    16 (0.7%,  5.7%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
    1243 (0.6%, 19.8%)     1 (0.0%,  5.8%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset
    1222 (0.6%, 20.4%)    13 (0.5%,  6.3%)  jiff::util::rangeint::ri8<_,_>::new
    1108 (0.6%, 21.0%)    28 (1.1%,  7.5%)  core::result::Result<T,E>::unwrap
    1097 (0.6%, 21.5%)     1 (0.0%,  7.5%)  jiff::shared::tzif::Header::parse
    1086 (0.6%, 22.1%)    22 (0.9%,  8.4%)  core::option::Option<T>::map_or

Kudos to @charliermarsh for finding this while profiling uv.
BurntSushi added a commit that referenced this pull request Jun 13, 2025
It turns out that #379 ended up pretty badly regressing datetime parsing
performance. In particular, the `with_context` routine on `Result<T, E>`
was _eagerly_ calling the caller-provided closure even in the success
case. Which means error values (which are generally much more expensive
to build than to parse a datetime) were being constructed on every
parse. Owch.

I'm not sure how I missed this. I ran the benchmarks in #379 and they
didn't report anything egregious. I should double-back and improve
benchmark coverage.

Thankfully, this doesn't seem to regress the number of LLVM lines we're
emitting:

  Lines                 Copies              Function name
  -----                 ------              -------------
  192746                2456                (TOTAL)
    3978 (2.1%,  2.1%)     1 (0.0%,  0.0%)  jiff::fmt::strtime::format::Formatter<W,L>::format_one
    3785 (2.0%,  4.0%)     1 (0.0%,  0.1%)  jiff::fmt::strtime::parse::Parser::parse
    3600 (1.9%,  5.9%)     2 (0.1%,  0.2%)  jiff::fmt::temporal::printer::SpanPrinter::print_span
    3340 (1.7%,  7.6%)     2 (0.1%,  0.2%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_designators
    3250 (1.7%,  9.3%)    86 (3.5%,  3.7%)  <core::result::Result<T,jiff::error::Error> as jiff::error::ErrorContext>::with_context::{{closure}}
    2910 (1.5%, 10.8%)     6 (0.2%,  4.0%)  jiff::fmt::temporal::printer::DateTimePrinter::print_time
    2190 (1.1%, 12.0%)     1 (0.0%,  4.0%)  jiff::span::Span::from_invariant_nanoseconds
    2166 (1.1%, 13.1%)     6 (0.2%,  4.3%)  jiff::fmt::temporal::printer::DateTimePrinter::print_date
    1760 (0.9%, 14.0%)     2 (0.1%,  4.4%)  jiff::fmt::temporal::printer::SpanPrinter::print_duration
    1588 (0.8%, 14.8%)     2 (0.1%,  4.4%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_hms
    1450 (0.8%, 15.6%)     2 (0.1%,  4.5%)  jiff::fmt::friendly::printer::SpanPrinter::print_span_designators_non_fraction
    1432 (0.7%, 16.3%)     4 (0.2%,  4.7%)  jiff::fmt::temporal::printer::DateTimePrinter::print_offset_rounded
    1398 (0.7%, 17.0%)     2 (0.1%,  4.8%)  jiff::fmt::temporal::printer::DateTimePrinter::print_pieces
    1368 (0.7%, 17.8%)     6 (0.2%,  5.0%)  jiff::fmt::friendly::printer::DesignatorWriter<W>::write
    1364 (0.7%, 18.5%)     2 (0.1%,  5.1%)  jiff::fmt::friendly::printer::SpanPrinter::print_duration_hms
    1248 (0.6%, 19.1%)    16 (0.7%,  5.7%)  <jiff::util::rangeint::RangedDebug<_,_> as core::fmt::Debug>::fmt
    1243 (0.6%, 19.8%)     1 (0.0%,  5.8%)  jiff::fmt::rfc2822::DateTimePrinter::print_civil_with_offset
    1222 (0.6%, 20.4%)    13 (0.5%,  6.3%)  jiff::util::rangeint::ri8<_,_>::new
    1108 (0.6%, 21.0%)    28 (1.1%,  7.5%)  core::result::Result<T,E>::unwrap
    1097 (0.6%, 21.5%)     1 (0.0%,  7.5%)  jiff::shared::tzif::Header::parse
    1086 (0.6%, 22.1%)    22 (0.9%,  8.4%)  core::option::Option<T>::map_or

Kudos to @charliermarsh for finding this while profiling uv.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant