add #[inline] to small and single-use functions#16
Conversation
|
Checking on x86_64. |
|
Can you fix the bencher source code please? Markdown broken apparently. ^^; |
|
Fixed, by adding newlines around |
|
Thanks! Here are the results:
|
|
Surprisingly, |
| Ok(b) | ||
| } | ||
|
|
||
| /// Converts the result of a unicode escape to the unit type |
There was a problem hiding this comment.
Also please keep the doc comment, always useful. ;)
There was a problem hiding this comment.
The doc comment is still there on the function def in the trait, so repeating it 3 times seems silly.
There was a problem hiding this comment.
Ah I see, then please ignore my comment.
Not to mention unfortunate :(. How are the results without the inline on Check::check_raw? |
|
Without
I didn't expect the difference to be this big. O.o |
eb3294d to
73aae9e
Compare
|
|
| /// which are returned via `callback`. | ||
| /// | ||
| /// NOTE: Does no escaping, but produces errors for bare carriage return ('\r'). | ||
| #[cfg_attr(any(target_arch = "aarch64", target_arch = "arm"), inline)] |
There was a problem hiding this comment.
Instead of having a cfg_attr, what are the results without this attribute on ARM? If they're not too bad, I think it'd be better to not start having conditional inlining for potential performance improvement (which might change any time).
There was a problem hiding this comment.
Basically this is the only inline that generates all the performance wins on arm...
There was a problem hiding this comment.
Ah. Ok, then please add a comment explaining why we have this attribute and what to check if it needs to be updated.
In the meantime, runnning benches with this new change.
There was a problem hiding this comment.
Hmm, maybe I misremembered slightly, as there are some small wins:
| bench name | current | new | diff |
|---|---|---|---|
| bench_check_raw_byte_str_ascii | 44628.4 ns/iter (+/- 102.2) | 44629.32 ns/iter (+/- 73.66) | 0.0% |
| bench_check_raw_c_str_ascii | 41234.96 ns/iter (+/- 121.9) | 41071.26 ns/iter (+/- 47.96) | -0.4% |
| bench_check_raw_c_str_non_ascii | 52474.5 ns/iter (+/- 72.77) | 52578.81 ns/iter (+/- 115.4) | 0.2% |
| bench_check_raw_c_str_unicode | 174637.42 ns/iter (+/- 1521.31) | 173704.05 ns/iter (+/- 1150.51) | -0.5% |
| bench_check_raw_str_ascii | 40644.14 ns/iter (+/- 93.49) | 40668.03 ns/iter (+/- 91.28) | 0.1% |
| bench_check_raw_str_non_ascii | 50437.34 ns/iter (+/- 81.88) | 50439.37 ns/iter (+/- 92.4) | 0.0% |
| bench_check_raw_str_unicode | 167964.36 ns/iter (+/- 613.95) | 168088.36 ns/iter (+/- 368.93) | 0.1% |
| bench_skip_ascii_whitespace | 13081.61 ns/iter (+/- 31.3) | 13095.27 ns/iter (+/- 14.95) | 0.1% |
| bench_unescape_byte_str_ascii | 68916.92 ns/iter (+/- 168.16) | 69179.61 ns/iter (+/- 393.16) | 0.4% |
| bench_unescape_byte_str_ascii_escape | 81236.48 ns/iter (+/- 177.29) | 81247.42 ns/iter (+/- 180.38) | 0.0% |
| bench_unescape_byte_str_hex_escape | 125728.99 ns/iter (+/- 256.18) | 125697.59 ns/iter (+/- 253.15) | -0.0% |
| bench_unescape_byte_str_mixed_escape | 348869.9 ns/iter (+/- 2290.38) | 348345.55 ns/iter (+/- 1578.4) | -0.2% |
| bench_unescape_c_str_ascii | 81575.56 ns/iter (+/- 180.46) | 76510.93 ns/iter (+/- 237.2) | -6.2% |
| bench_unescape_c_str_ascii_escape | 96108.78 ns/iter (+/- 1281.04) | 97554.84 ns/iter (+/- 160.5) | 1.5% |
| bench_unescape_c_str_hex_escape_ascii | 161725.35 ns/iter (+/- 1096.9) | 153620.12 ns/iter (+/- 295.55) | -5.0% |
| bench_unescape_c_str_hex_escape_byte | 162388.73 ns/iter (+/- 504.69) | 149612.35 ns/iter (+/- 236.66) | -7.9% |
| bench_unescape_c_str_mixed_escape | 1045466.2 ns/iter (+/- 5675.85) | 1016333.3 ns/iter (+/- 2407.72) | -2.8% |
| bench_unescape_c_str_non_ascii | 110547.5 ns/iter (+/- 327.76) | 95382.77 ns/iter (+/- 314.99) | -13.7% |
| bench_unescape_c_str_unicode | 385332.7 ns/iter (+/- 2524.07) | 341268.3 ns/iter (+/- 1450.59) | -11.4% |
| bench_unescape_c_str_unicode_escape | 308526.52 ns/iter (+/- 1321.49) | 310357.6 ns/iter (+/- 709.15) | 0.6% |
| bench_unescape_str_ascii | 77139.01 ns/iter (+/- 175.61) | 73224.7 ns/iter (+/- 185.46) | -5.1% |
| bench_unescape_str_ascii_escape | 105477.82 ns/iter (+/- 313.16) | 103074.02 ns/iter (+/- 1257.79) | -2.3% |
| bench_unescape_str_hex_escape | 165270.16 ns/iter (+/- 577.06) | 165128.7 ns/iter (+/- 629.91) | -0.1% |
| bench_unescape_str_mixed_escape | 906241.1 ns/iter (+/- 2452.01) | 905221.2 ns/iter (+/- 2022.47) | -0.1% |
| bench_unescape_str_non_ascii | 98002.09 ns/iter (+/- 341.61) | 97892.88 ns/iter (+/- 267.58) | -0.1% |
| bench_unescape_str_unicode | 344989.9 ns/iter (+/- 651.18) | 343551.35 ns/iter (+/- 1453.84) | -0.4% |
| bench_unescape_str_unicode_escape | 596409.25 ns/iter (+/- 2117.97) | 595745.65 ns/iter (+/- 2342.47) | -0.1% |
There was a problem hiding this comment.
Here are the results:
| bench name | current | new | diff |
|---|---|---|---|
| bench_check_raw_byte_str_ascii | 20597.79 ns/iter (+/- 111.91) | 21108.64 ns/iter (+/- 682.89) | 2.5% |
| bench_check_raw_c_str_ascii | 20242.44 ns/iter (+/- 35.66) | 20105.08 ns/iter (+/- 72.01) | -0.7% |
| bench_check_raw_c_str_non_ascii | 37288.94 ns/iter (+/- 60.57) | 37305.4 ns/iter (+/- 67.56) | 0.0% |
| bench_check_raw_c_str_unicode | 115247.8 ns/iter (+/- 267.7) | 115218.52 ns/iter (+/- 222.48) | -0.0% |
| bench_check_raw_str_ascii | 19811.75 ns/iter (+/- 34.42) | 19775.9 ns/iter (+/- 38.97) | -0.2% |
| bench_check_raw_str_non_ascii | 35882.4 ns/iter (+/- 120.81) | 35862.94 ns/iter (+/- 37.65) | -0.1% |
| bench_check_raw_str_unicode | 109795.61 ns/iter (+/- 318.02) | 109668.07 ns/iter (+/- 252.03) | -0.1% |
| bench_skip_ascii_whitespace | 6310.7 ns/iter (+/- 29.53) | 6316.3 ns/iter (+/- 18.24) | 0.1% |
| bench_unescape_byte_str_ascii | 43636.65 ns/iter (+/- 193.89) | 43529.61 ns/iter (+/- 168.8) | -0.2% |
| bench_unescape_byte_str_ascii_escape | 49115.14 ns/iter (+/- 81.32) | 49232.31 ns/iter (+/- 355.52) | 0.2% |
| bench_unescape_byte_str_hex_escape | 78711.55 ns/iter (+/- 203.76) | 77679.72 ns/iter (+/- 126.33) | -1.3% |
| bench_unescape_byte_str_mixed_escape | 217949.42 ns/iter (+/- 396.08) | 182727.73 ns/iter (+/- 5340.48) | -16.2% |
| bench_unescape_c_str_ascii | 54202.44 ns/iter (+/- 213.68) | 37968.36 ns/iter (+/- 1634.7) | -30.0% |
| bench_unescape_c_str_ascii_escape | 66313.15 ns/iter (+/- 295.46) | 59751.49 ns/iter (+/- 347.79) | -9.9% |
| bench_unescape_c_str_hex_escape_ascii | 105983.81 ns/iter (+/- 316.38) | 85106.36 ns/iter (+/- 643.56) | -19.7% |
| bench_unescape_c_str_hex_escape_byte | 106426.81 ns/iter (+/- 285.65) | 85051.97 ns/iter (+/- 328.0) | -20.1% |
| bench_unescape_c_str_mixed_escape | 649798.95 ns/iter (+/- 2989.2) | 557485.15 ns/iter (+/- 10110.31) | -14.2% |
| bench_unescape_c_str_non_ascii | 76784.48 ns/iter (+/- 356.41) | 62791.62 ns/iter (+/- 244.16) | -18.2% |
| bench_unescape_c_str_unicode | 262566.75 ns/iter (+/- 3679.59) | 214002.85 ns/iter (+/- 3932.69) | -18.5% |
| bench_unescape_c_str_unicode_escape | 185377.66 ns/iter (+/- 621.7) | 163651.8 ns/iter (+/- 935.65) | -11.7% |
| bench_unescape_str_ascii | 44971.66 ns/iter (+/- 349.02) | 45062.01 ns/iter (+/- 312.33) | 0.2% |
| bench_unescape_str_ascii_escape | 66951.3 ns/iter (+/- 347.23) | 66292.56 ns/iter (+/- 137.83) | -1.0% |
| bench_unescape_str_hex_escape | 102903.64 ns/iter (+/- 173.36) | 101440.36 ns/iter (+/- 282.0) | -1.4% |
| bench_unescape_str_mixed_escape | 743504.7 ns/iter (+/- 2975.47) | 732506.6 ns/iter (+/- 5604.11) | -1.5% |
| bench_unescape_str_non_ascii | 63212.57 ns/iter (+/- 183.07) | 63535.35 ns/iter (+/- 103.64) | 0.5% |
| bench_unescape_str_unicode | 217130.85 ns/iter (+/- 427.54) | 217462.52 ns/iter (+/- 837.24) | 0.2% |
| bench_unescape_str_unicode_escape | 348727.47 ns/iter (+/- 569.06) | 349065.45 ns/iter (+/- 4820.3) | 0.1% |
Not many changes, so all good. :)
There was a problem hiding this comment.
Just saw your comment. Well now all platforms have comparable numbers, that's good! Let me write a summary at the end.
There was a problem hiding this comment.
I've taken this out for now, to leave only uncontroversial cases of inline.
73aae9e to
a8211d0
Compare
|
Summary:
Seems all good to me, thanks for the performance improvement! |
|
cc @Urgau :) |
|
Or I can try @estebank. Can you merge please? :) |
On ARM:
Someone please check x86_64!
BTW I made some small improvements (I hope) to the bench script (mostly in
get_good_results):Details