benchmark.cc: Default to inverted mode, add small_digits mode. by StephanTLavavej · Pull Request #74 · ulfjack/ryu

StephanTLavavej · 2018-08-15T02:45:55Z

Note: If the hexfloat change is undesirable, I can restore the original behavior with a tiny bit of work.

benchmark.cc: Reject unrecognized options.

benchmark.cc: Print hexfloats in verbose mode.

First, this extracts generate_float() and generate_double().

That eliminates the r integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles.

Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").

benchmark.cc: Default to inverted mode, add "-classic".

benchmark.cc: Extract benchmark_options.

This makes it easier to pass options to bench32() and bench64().

benchmark.cc: Validate samples and iterations options.

benchmark.cc: Add "-small_digits=%i".

This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.)

This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers).

As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as:

1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0

That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer).

Currently, shorter output appears to be more stressful for doubles:

64:  118.619    1.991 (x86 benchmark_clang -ryu -64)
64:  277.499    3.048 (x86 benchmark_clang -ryu -64 -small_digits=7)
64:  306.753    2.787 (x86 benchmark_clang -ryu -64 -small_digits=6)
64:  327.964    3.427 (x86 benchmark_clang -ryu -64 -small_digits=5)
64:  347.708    2.876 (x86 benchmark_clang -ryu -64 -small_digits=4)
64:  369.915    2.371 (x86 benchmark_clang -ryu -64 -small_digits=3)
64:  403.309    9.321 (x86 benchmark_clang -ryu -64 -small_digits=2)
64:  477.200    3.409 (x86 benchmark_clang -ryu -64 -small_digits=1)

64:   42.266    1.270 (x64 benchmark_clang -ryu -64)
64:   45.798    1.356 (x64 benchmark_clang -ryu -64 -small_digits=7)
64:   47.418    1.454 (x64 benchmark_clang -ryu -64 -small_digits=6)
64:   49.004    1.464 (x64 benchmark_clang -ryu -64 -small_digits=5)
64:   50.620    1.209 (x64 benchmark_clang -ryu -64 -small_digits=4)
64:   52.759    1.275 (x64 benchmark_clang -ryu -64 -small_digits=3)
64:   55.585    1.402 (x64 benchmark_clang -ryu -64 -small_digits=2)
64:   66.844    1.378 (x64 benchmark_clang -ryu -64 -small_digits=1)

Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case.

32:   42.478    1.558 (x86 benchmark_clang -ryu -32)
32:   33.758    1.145 (x86 benchmark_clang -ryu -32 -small_digits=7)
32:   35.518    1.048 (x86 benchmark_clang -ryu -32 -small_digits=6)
32:   36.035    1.113 (x86 benchmark_clang -ryu -32 -small_digits=5)
32:   37.629    0.999 (x86 benchmark_clang -ryu -32 -small_digits=4)
32:   39.157    1.061 (x86 benchmark_clang -ryu -32 -small_digits=3)
32:   45.113    1.027 (x86 benchmark_clang -ryu -32 -small_digits=2)
32:   55.080    1.227 (x86 benchmark_clang -ryu -32 -small_digits=1)

32:   30.599    1.528 (x64 benchmark_clang -ryu -32)
32:   23.771    0.907 (x64 benchmark_clang -ryu -32 -small_digits=7)
32:   24.571    1.140 (x64 benchmark_clang -ryu -32 -small_digits=6)
32:   25.138    0.864 (x64 benchmark_clang -ryu -32 -small_digits=5)
32:   26.579    1.020 (x64 benchmark_clang -ryu -32 -small_digits=4)
32:   27.664    1.095 (x64 benchmark_clang -ryu -32 -small_digits=3)
32:   30.341    1.405 (x64 benchmark_clang -ryu -32 -small_digits=2)
32:   32.580    1.129 (x64 benchmark_clang -ryu -32 -small_digits=1)

First, this extracts generate_float() and generate_double(). That eliminates the `r` integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles. Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").

This makes it easier to pass options to bench32() and bench64().

This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.) This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers). As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as: 1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0 That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer). Currently, shorter output appears to be more stressful for doubles: ``` 64: 118.619 1.991 (x86 benchmark_clang -ryu -64) 64: 277.499 3.048 (x86 benchmark_clang -ryu -64 -small_digits=7) 64: 306.753 2.787 (x86 benchmark_clang -ryu -64 -small_digits=6) 64: 327.964 3.427 (x86 benchmark_clang -ryu -64 -small_digits=5) 64: 347.708 2.876 (x86 benchmark_clang -ryu -64 -small_digits=4) 64: 369.915 2.371 (x86 benchmark_clang -ryu -64 -small_digits=3) 64: 403.309 9.321 (x86 benchmark_clang -ryu -64 -small_digits=2) 64: 477.200 3.409 (x86 benchmark_clang -ryu -64 -small_digits=1) 64: 42.266 1.270 (x64 benchmark_clang -ryu -64) 64: 45.798 1.356 (x64 benchmark_clang -ryu -64 -small_digits=7) 64: 47.418 1.454 (x64 benchmark_clang -ryu -64 -small_digits=6) 64: 49.004 1.464 (x64 benchmark_clang -ryu -64 -small_digits=5) 64: 50.620 1.209 (x64 benchmark_clang -ryu -64 -small_digits=4) 64: 52.759 1.275 (x64 benchmark_clang -ryu -64 -small_digits=3) 64: 55.585 1.402 (x64 benchmark_clang -ryu -64 -small_digits=2) 64: 66.844 1.378 (x64 benchmark_clang -ryu -64 -small_digits=1) ``` Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case. ``` 32: 42.478 1.558 (x86 benchmark_clang -ryu -32) 32: 33.758 1.145 (x86 benchmark_clang -ryu -32 -small_digits=7) 32: 35.518 1.048 (x86 benchmark_clang -ryu -32 -small_digits=6) 32: 36.035 1.113 (x86 benchmark_clang -ryu -32 -small_digits=5) 32: 37.629 0.999 (x86 benchmark_clang -ryu -32 -small_digits=4) 32: 39.157 1.061 (x86 benchmark_clang -ryu -32 -small_digits=3) 32: 45.113 1.027 (x86 benchmark_clang -ryu -32 -small_digits=2) 32: 55.080 1.227 (x86 benchmark_clang -ryu -32 -small_digits=1) 32: 30.599 1.528 (x64 benchmark_clang -ryu -32) 32: 23.771 0.907 (x64 benchmark_clang -ryu -32 -small_digits=7) 32: 24.571 1.140 (x64 benchmark_clang -ryu -32 -small_digits=6) 32: 25.138 0.864 (x64 benchmark_clang -ryu -32 -small_digits=5) 32: 26.579 1.020 (x64 benchmark_clang -ryu -32 -small_digits=4) 32: 27.664 1.095 (x64 benchmark_clang -ryu -32 -small_digits=3) 32: 30.341 1.405 (x64 benchmark_clang -ryu -32 -small_digits=2) 32: 32.580 1.129 (x64 benchmark_clang -ryu -32 -small_digits=1) ```

ulfjack · 2018-08-15T14:42:55Z

I was using the int output to generate the graphs in the paper (with gnuplot). I'd prefer to keep that; I'm not sure this can easily be changed in bash or gnuplot.

StephanTLavavej · 2018-08-15T18:48:53Z

Restored! Also, I looked at gnuplot.template but couldn't figure out how to adapt it to the addition of ryu_output; is there an easy way to do that, or would it tolerate the string field being moved to the end? I think it's useful but of course I don't want to break your graphs. If necessary, I could add yet another option to emit ryu_output.

ulfjack · 2018-08-16T12:23:39Z

I'll take a look.

StephanTLavavej added 6 commits August 13, 2018 21:59

benchmark.cc: Reject unrecognized options.

ebf9f44

benchmark.cc: Default to inverted mode, add "-classic".

e389a51

benchmark.cc: Extract benchmark_options.

6a1f227

This makes it easier to pass options to bench32() and bench64().

benchmark.cc: Validate samples and iterations options.

19161a6

benchmark.cc: Restore float_bits_as_int.

27d7712

StephanTLavavej mentioned this pull request Aug 16, 2018

Optimize 64-bit division-by-constant for x86 platforms #73

Merged

ulfjack merged commit 2dbe0a1 into ulfjack:master Aug 16, 2018

StephanTLavavej deleted the more_benchmarking branch August 16, 2018 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.cc: Default to inverted mode, add small_digits mode.#74

benchmark.cc: Default to inverted mode, add small_digits mode.#74
ulfjack merged 7 commits into
ulfjack:masterfrom
StephanTLavavej:more_benchmarking

StephanTLavavej commented Aug 15, 2018

Uh oh!

ulfjack commented Aug 15, 2018

Uh oh!

StephanTLavavej commented Aug 15, 2018

Uh oh!

ulfjack commented Aug 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

StephanTLavavej commented Aug 15, 2018

Uh oh!

ulfjack commented Aug 15, 2018

Uh oh!

StephanTLavavej commented Aug 15, 2018

Uh oh!

ulfjack commented Aug 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants