Skip to content

[DNM] apd: embed small coefficient values in Decimal struct#101

Closed
nvb wants to merge 4 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/inlineCoeff
Closed

[DNM] apd: embed small coefficient values in Decimal struct#101
nvb wants to merge 4 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/inlineCoeff

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jan 1, 2022

Replaces cockroachdb/cockroach#74369.

This commit introduces a performance optimization that embeds small coefficient values directly in their Decimal struct, instead of storing these values in a separate heap allocation.

Each Decimal maintains (through big.Int) an internal reference to a variable-length coefficient, which is represented by a []big.Word. The coeffInline field and the lazyInit method combine to allow us to inline this coefficient within the Decimal struct when its value is sufficiently small. In lazyInit, we point the Coeff field's backing slice at the coeffInline array. big.Int will avoid re-allocating this array until it is provided with a value that exceeds the initial capacity. We set this initial capacity to accommodate any coefficient that would fit in a 64-bit integer (i.e. values up to 2^64).

This is an alternative to an optimization that many other arbitrary precision decimal libraries have where small coefficient values are stored as numeric fields in their data type's struct. Only when this coefficient value gets large enough do these libraries fall back to a variable-length coefficient with internal indirection. We can see the optimization in practice in the ericlagergren/decimal library, where each struct contains a compact uint64 and an unscaled big.Int. Prior concern from the authors of cockroachdb/apd regarding this form of compact representation optimization was that it traded performance for complexity. The optimization fractured control flow, leaking out across the library and leading to more complex, error-prone code.

The approach taken in this commit does not have the same issue. All arithmetic on the decimal's coefficient is still deferred to bit.Int. In fact, the entire optimization is best-effort, and bugs that lead to missed calls to lazyInit are merely missed opportunities to avoid a heap allocation, and nothing more serious. The internal reference within Decimal is vulnerable to memory aliasing bugs when a Decimal is copied by value (and not through the Decimal.Set method), but this is not a new concern.

Impact on benchmarks:

name                   old time/op    new time/op    delta
GDA/compare-10           61.5µs ±20%    54.5µs ±18%  -11.34%  (p=0.035 n=10+10)
GDA/reduce-10            18.8µs ± 7%    17.5µs ± 6%   -7.16%  (p=0.001 n=10+10)
GDA/remainder-10         67.0µs ±11%    62.2µs ± 6%   -7.09%  (p=0.023 n=10+10)
Exp/P5/S-4/D-2-10        2.56µs ± 4%    2.51µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-4/D2-10         2.65µs ± 4%    2.60µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D-2-10        6.44µs ± 5%    6.31µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D2-10         6.27µs ± 4%    6.15µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D-2-10         19.3µs ± 4%    19.1µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D2-10          19.1µs ± 4%    18.9µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-10-10      10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-2-10       5.59µs ± 4%    5.49µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D2-10        4.68µs ± 4%    4.62µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D10-10       10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-10-10      22.2µs ± 4%    22.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-2-10       12.2µs ± 4%    12.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D2-10        12.2µs ± 4%    12.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D10-10       21.8µs ± 4%    21.7µs ± 4%     ~     (p=0.247 n=10+10)
Exp/P10/S2/D-10-10       44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.165 n=10+10)
Exp/P10/S2/D-2-10        31.8µs ± 4%    31.7µs ± 4%     ~     (p=0.190 n=10+10)
Exp/P10/S2/D2-10         28.7µs ± 4%    28.6µs ± 4%     ~     (p=0.139 n=10+10)
Exp/P10/S2/D10-10        44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P100/S-4/D-100-10     561µs ± 4%     565µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S-4/D-10-10      297µs ± 4%     299µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-4/D-2-10       286µs ± 4%     289µs ± 4%     ~     (p=0.315 n=10+10)
Exp/P100/S-4/D2-10        287µs ± 4%     289µs ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S-4/D10-10       292µs ± 4%     294µs ± 4%     ~     (p=0.529 n=10+10)
Exp/P100/S-4/D100-10      550µs ± 4%     554µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-100-10    1.05ms ± 4%    1.05ms ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D-10-10      637µs ± 4%     641µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-2-10       636µs ± 4%     641µs ± 4%     ~     (p=0.436 n=10+10)
Exp/P100/S-1/D2-10        612µs ± 4%     616µs ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D10-10       621µs ± 4%     625µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-1/D100-10     1.07ms ± 3%    1.08ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-100-10     1.54ms ± 3%    1.55ms ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S2/D-10-10      1.06ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-2-10        918µs ± 4%     924µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D2-10        1.01ms ± 4%    1.02ms ± 4%     ~     (p=0.853 n=10+10)
Exp/P100/S2/D10-10       1.05ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D100-10      1.58ms ± 4%    1.59ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P2/S-100/D2-10        25.1µs ± 4%    25.0µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-10/D2-10         22.6µs ± 4%    22.6µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-2/D2-10          24.9µs ± 4%    24.8µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S2/D2-10           28.1µs ± 4%    27.9µs ± 4%     ~     (p=0.139 n=10+10)
Ln/P2/S10/D2-10          27.0µs ± 4%    26.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S100/D2-10         28.7µs ± 4%    28.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-100/D2-10       78.1µs ± 4%    78.1µs ± 4%     ~     (p=0.218 n=10+10)
Ln/P10/S-100/D10-10      74.3µs ± 4%    74.3µs ± 4%     ~     (p=0.190 n=10+10)
Ln/P10/S-10/D2-10        83.4µs ± 4%    83.4µs ± 4%     ~     (p=0.315 n=10+10)
Ln/P10/S-10/D10-10       80.9µs ± 4%    80.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-2/D2-10         82.1µs ± 4%    82.2µs ± 5%     ~     (p=0.280 n=10+10)
Ln/P10/S-2/D10-10        90.6µs ± 4%    90.6µs ± 5%     ~     (p=0.190 n=10+10)
Ln/P10/S2/D2-10          80.6µs ± 4%    80.6µs ± 4%     ~     (p=0.184 n=10+10)
Ln/P10/S2/D10-10         79.7µs ± 4%    79.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S10/D2-10         85.2µs ± 4%    85.2µs ± 5%     ~     (p=0.218 n=10+10)
Ln/P10/S10/D10-10        74.8µs ± 4%    74.7µs ± 4%     ~     (p=0.247 n=10+10)
Ln/P10/S100/D2-10        83.8µs ± 4%    83.6µs ± 4%     ~     (p=0.165 n=10+10)
Ln/P10/S100/D10-10       66.1µs ± 4%    65.9µs ± 5%     ~     (p=0.143 n=10+10)
Ln/P100/S-100/D2-10      3.87ms ± 4%    3.90ms ± 5%     ~     (p=0.853 n=10+10)
Ln/P100/S-100/D10-10     3.34ms ± 4%    3.35ms ± 5%     ~     (p=0.529 n=10+10)
Ln/P100/S-100/D100-10    3.46ms ± 4%    3.48ms ± 5%     ~     (p=0.971 n=10+10)
Ln/P100/S-10/D2-10       3.63ms ± 4%    3.65ms ± 4%     ~     (p=0.739 n=10+10)
Ln/P100/S-10/D10-10      3.43ms ± 4%    3.44ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S-10/D100-10     3.42ms ± 4%    3.45ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D2-10        3.51ms ± 4%    3.53ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P100/S-2/D10-10       3.21ms ± 4%    3.24ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D100-10      3.47ms ± 4%    3.50ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S2/D2-10         3.56ms ± 4%    3.59ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S2/D10-10        3.82ms ± 4%    3.84ms ± 4%     ~     (p=1.000 n=10+10)
Ln/P100/S2/D100-10       3.72ms ± 4%    3.75ms ± 5%     ~     (p=0.353 n=10+10)
Ln/P100/S10/D2-10        3.78ms ± 4%    3.81ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S10/D10-10       3.50ms ± 4%    3.52ms ± 4%     ~     (p=0.529 n=10+10)
Ln/P100/S10/D100-10      3.49ms ± 4%    3.52ms ± 4%     ~     (p=0.684 n=10+10)
Ln/P100/S100/D2-10       3.82ms ± 4%    3.85ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D10-10      3.72ms ± 4%    3.75ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D100-10     3.65ms ± 4%    3.68ms ± 4%     ~     (p=0.315 n=10+10)
GDA/abs-10               10.4µs ± 6%    10.3µs ± 6%     ~     (p=0.564 n=10+10)
GDA/base-10               110µs ± 4%     111µs ± 5%     ~     (p=0.869 n=10+10)
GDA/comparetotal-10      30.2µs ± 5%    30.3µs ± 5%     ~     (p=0.631 n=10+10)
GDA/divide-10             391µs ± 7%     390µs ± 3%     ~     (p=0.853 n=10+10)
GDA/exp-10                127ms ± 4%     127ms ± 4%     ~     (p=0.796 n=10+10)
GDA/ln-10                83.6ms ± 3%    83.6ms ± 4%     ~     (p=0.218 n=10+10)
GDA/log10-10              106ms ± 4%     105ms ± 4%     ~     (p=0.218 n=10+10)
GDA/minus-10             11.7µs ± 4%    11.7µs ± 7%     ~     (p=0.579 n=10+10)
GDA/multiply-10          77.3µs ± 9%    80.8µs ± 9%     ~     (p=0.089 n=10+10)
GDA/plus-10              42.9µs ± 5%    44.2µs ± 4%     ~     (p=0.052 n=10+10)
GDA/power-10              218ms ± 3%     218ms ± 3%     ~     (p=0.631 n=10+10)
GDA/powersqrt-10          450ms ± 2%     448ms ± 2%     ~     (p=0.280 n=10+10)
GDA/quantize-10           143µs ±12%     128µs ± 8%     ~     (p=0.052 n=10+10)
GDA/randoms-10           3.05ms ± 5%    3.02ms ± 5%     ~     (p=0.143 n=10+10)
GDA/rounding-10           659µs ± 4%     658µs ± 4%     ~     (p=0.579 n=10+10)
GDA/squareroot-10        33.1ms ± 3%    32.7ms ± 3%     ~     (p=0.143 n=10+10)
GDA/subtract-10           172µs ±11%     170µs ± 8%     ~     (p=0.912 n=10+10)
GDA/tointegral-10        32.9µs ± 6%    32.2µs ± 5%     ~     (p=0.190 n=10+10)
GDA/tointegralx-10       33.6µs ± 6%    32.7µs ± 4%     ~     (p=0.190 n=10+10)
GDA/cuberoot-apd-10      2.08ms ± 3%    2.08ms ± 4%     ~     (p=0.165 n=10+10)
GDA/add-10                904µs ± 5%     919µs ± 5%   +1.65%  (p=0.043 n=10+10)
GDA/divideint-10         28.7µs ± 4%    29.7µs ± 1%   +3.51%  (p=0.026 n=9+6)

name                   old alloc/op   new alloc/op   delta
GDA/compare-10           10.1kB ± 0%     7.0kB ± 0%  -30.72%  (p=0.000 n=10+10)
GDA/quantize-10          42.3kB ± 0%    36.5kB ± 0%  -13.63%  (p=0.000 n=9+10)
GDA/divideint-10         6.51kB ± 0%    5.70kB ± 0%  -12.41%  (p=0.000 n=10+10)
GDA/remainder-10         22.4kB ± 0%    20.0kB ± 0%  -10.60%  (p=0.000 n=10+10)
GDA/tointegralx-10       19.2kB ± 0%    17.4kB ± 0%   -9.28%  (p=0.000 n=10+10)
GDA/tointegral-10        19.1kB ± 0%    17.4kB ± 0%   -8.83%  (p=0.000 n=10+10)
GDA/reduce-10            2.22kB ± 0%    2.02kB ± 0%   -8.66%  (p=0.000 n=10+10)
GDA/divide-10            74.3kB ± 0%    70.1kB ± 0%   -5.62%  (p=0.000 n=10+10)
GDA/squareroot-10        8.30MB ± 0%    7.96MB ± 0%   -4.08%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10        1.23kB ± 0%    1.18kB ± 0%   -3.92%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10       4.50kB ± 0%    4.33kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10        2.83kB ± 0%    2.72kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10         1.29kB ± 0%    1.24kB ± 0%   -3.72%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10        2.02kB ± 0%    1.94kB ± 0%   -3.71%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10        9.59kB ± 0%    9.24kB ± 0%   -3.66%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10         7.53kB ± 0%    7.26kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10          7.76kB ± 0%    7.48kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10         9.75kB ± 0%    9.40kB ± 0%   -3.60%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10       2.32kB ± 0%    2.24kB ± 0%   -3.58%  (p=0.000 n=10+10)
GDA/rounding-10           246kB ± 0%     237kB ± 0%   -3.56%  (p=0.000 n=10+10)
GDA/randoms-10           1.16MB ± 0%    1.12MB ± 0%   -3.55%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10       9.79kB ± 0%    9.44kB ± 0%   -3.54%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10        10.7kB ± 0%    10.3kB ± 0%   -3.53%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10         2.87kB ± 0%    2.77kB ± 0%   -3.52%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10      4.95kB ± 0%    4.78kB ± 0%   -3.47%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10        4.64kB ± 0%    4.49kB ± 0%   -3.40%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       261kB ± 0%     252kB ± 0%   -3.36%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10       5.17kB ± 0%    5.00kB ± 0%   -3.33%  (p=0.000 n=10+10)
GDA/powersqrt-10         77.6MB ± 0%    75.1MB ± 0%   -3.29%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10         17.1kB ± 0%    16.5kB ± 0%   -3.26%  (p=0.000 n=9+10)
Ln/P10/S-2/D10-10        18.6kB ± 0%    18.0kB ± 0%   -3.24%  (p=0.000 n=9+8)
Exp/P10/S-4/D-10-10      2.56kB ± 0%    2.48kB ± 0%   -3.24%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10          17.0kB ± 0%    16.5kB ± 0%   -3.23%  (p=0.000 n=7+10)
Ln/P10/S10/D10-10        15.8kB ± 0%    15.3kB ± 0%   -3.23%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10         18.0kB ± 0%    17.4kB ± 0%   -3.22%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10        17.0kB ± 0%    16.4kB ± 0%   -3.22%  (p=0.000 n=8+10)
Ln/P10/S2/D10-10         16.5kB ± 0%    16.0kB ± 0%   -3.21%  (p=0.000 n=10+10)
Ln/P10/S-10/D10-10       17.1kB ± 0%    16.5kB ± 0%   -3.18%  (p=0.000 n=10+7)
Exp/P10/S-4/D10-10       2.61kB ± 0%    2.52kB ± 0%   -3.18%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10          10.3kB ± 0%     9.9kB ± 0%   -3.15%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10       17.1kB ± 0%    16.6kB ± 0%   -3.11%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10        17.8kB ± 0%    17.2kB ± 0%   -3.11%  (p=0.000 n=8+10)
Ln/P2/S2/D2-10           10.3kB ± 0%     9.9kB ± 0%   -3.10%  (p=0.000 n=10+9)
Ln/P2/S-2/D2-10          9.62kB ± 0%    9.32kB ± 0%   -3.09%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10       15.0kB ± 0%    14.5kB ± 0%   -3.09%  (p=0.000 n=9+10)
Ln/P10/S-100/D10-10      16.3kB ± 0%    15.8kB ± 0%   -3.08%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10         8.91kB ± 0%    8.64kB ± 0%   -3.07%  (p=0.000 n=10+9)
GDA/subtract-10           100kB ± 0%      97kB ± 0%   -2.89%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10         10.9kB ± 0%    10.6kB ± 0%   -2.86%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10        9.81kB ± 0%    9.53kB ± 0%   -2.83%  (p=0.000 n=10+10)
GDA/ln-10                10.2MB ± 0%    10.0MB ± 0%   -2.09%  (p=0.000 n=10+10)
GDA/log10-10             12.5MB ± 0%    12.3MB ± 0%   -1.87%  (p=0.000 n=10+10)
GDA/power-10             27.0MB ± 0%    26.5MB ± 0%   -1.78%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10      31.9kB ± 0%    31.4kB ± 0%   -1.46%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10     33.5kB ± 0%    33.0kB ± 0%   -1.41%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10       31.8kB ± 0%    31.4kB ± 0%   -1.40%  (p=0.000 n=10+9)
Exp/P100/S-4/D-2-10      34.6kB ± 0%    34.1kB ± 0%   -1.36%  (p=0.000 n=10+9)
Exp/P100/S-1/D10-10      65.0kB ± 0%    64.2kB ± 0%   -1.27%  (p=0.000 n=8+10)
Exp/P100/S-1/D2-10       64.9kB ± 0%    64.1kB ± 0%   -1.24%  (p=0.000 n=10+10)
Exp/P100/S-1/D-10-10     70.1kB ± 0%    69.3kB ± 0%   -1.19%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10      72.6kB ± 0%    71.8kB ± 0%   -1.14%  (p=0.000 n=9+9)
Exp/P100/S2/D10-10        115kB ± 0%     114kB ± 0%   -1.05%  (p=0.000 n=10+10)
Exp/P100/S2/D2-10         113kB ± 0%     112kB ± 0%   -1.01%  (p=0.000 n=10+10)
Exp/P100/S2/D-2-10        107kB ± 0%     106kB ± 0%   -1.00%  (p=0.000 n=10+10)
Exp/P100/S2/D-10-10       117kB ± 0%     116kB ± 0%   -0.96%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       318kB ± 0%     315kB ± 0%   -0.83%  (p=0.000 n=10+8)
Ln/P100/S-100/D100-10     328kB ± 0%     326kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S-10/D2-10        336kB ± 0%     334kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         326kB ± 0%     324kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S100/D2-10        353kB ± 0%     350kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S10/D10-10        326kB ± 0%     324kB ± 0%   -0.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      340kB ± 0%     338kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         351kB ± 0%     348kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S-10/D100-10      323kB ± 0%     320kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        344kB ± 0%     341kB ± 0%   -0.77%  (p=0.000 n=9+6)
Ln/P100/S-100/D10-10      311kB ± 0%     309kB ± 0%   -0.76%  (p=0.000 n=10+7)
Ln/P100/S10/D100-10       323kB ± 0%     321kB ± 0%   -0.76%  (p=0.000 n=8+9)
Ln/P100/S-100/D2-10       362kB ± 0%     360kB ± 0%   -0.75%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       345kB ± 0%     342kB ± 0%   -0.74%  (p=0.000 n=9+8)
Exp/P100/S-1/D100-10     94.0kB ± 0%    93.3kB ± 0%   -0.74%  (p=0.000 n=8+10)
Exp/P100/S-4/D100-10     47.9kB ± 0%    47.5kB ± 0%   -0.74%  (p=0.000 n=6+9)
Ln/P100/S2/D2-10          333kB ± 0%     330kB ± 0%   -0.74%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10    97.5kB ± 0%    96.8kB ± 0%   -0.70%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10    51.5kB ± 0%    51.2kB ± 0%   -0.70%  (p=0.000 n=10+10)
Ln/P100/S10/D2-10         346kB ± 0%     343kB ± 0%   -0.69%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        306kB ± 0%     304kB ± 0%   -0.68%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       329kB ± 0%     326kB ± 0%   -0.67%  (p=0.000 n=10+8)
Exp/P100/S2/D100-10       150kB ± 0%     149kB ± 0%   -0.67%  (p=0.000 n=10+10)
Exp/P100/S2/D-100-10      148kB ± 0%     147kB ± 0%   -0.62%  (p=0.000 n=9+10)
GDA/exp-10               61.3MB ± 0%    61.1MB ± 0%   -0.18%  (p=0.000 n=10+10)
GDA/base-10              24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
GDA/comparetotal-10      7.11kB ± 0%    7.11kB ± 0%     ~     (all equal)
GDA/add-10                712kB ± 0%     720kB ± 0%   +1.18%  (p=0.000 n=10+8)
GDA/plus-10              46.6kB ± 0%    47.5kB ± 0%   +2.04%  (p=0.000 n=10+9)
GDA/multiply-10          35.5kB ± 0%    39.2kB ± 0%  +10.32%  (p=0.000 n=10+10)
GDA/minus-10             2.43kB ± 0%    2.82kB ± 0%  +16.12%  (p=0.000 n=10+10)
GDA/abs-10               2.33kB ± 0%    2.82kB ± 0%  +21.31%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
GDA/reduce-10               187 ± 0%        43 ± 0%  -77.01%  (p=0.000 n=10+10)
GDA/compare-10              638 ± 0%       250 ± 0%  -60.82%  (p=0.000 n=10+10)
GDA/minus-10                164 ± 0%        73 ± 0%  -55.49%  (p=0.000 n=10+10)
GDA/abs-10                  151 ± 0%        73 ± 0%  -51.66%  (p=0.000 n=10+10)
GDA/quantize-10           1.70k ± 0%     0.98k ± 0%  -42.30%  (p=0.000 n=10+10)
GDA/tointegralx-10          677 ± 0%       454 ± 0%  -32.94%  (p=0.000 n=10+10)
GDA/remainder-10            923 ± 0%       626 ± 0%  -32.18%  (p=0.000 n=10+10)
GDA/tointegral-10           665 ± 0%       454 ± 0%  -31.73%  (p=0.000 n=10+10)
GDA/divideint-10            357 ± 0%       256 ± 0%  -28.29%  (p=0.000 n=10+10)
GDA/subtract-10           2.85k ± 0%     2.14k ± 0%  -24.89%  (p=0.000 n=10+10)
GDA/divide-10             2.51k ± 0%     1.91k ± 0%  -24.01%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10          47.0 ± 0%      37.0 ± 0%  -21.28%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10           50.0 ± 0%      40.0 ± 0%  -20.00%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10          76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+10)
GDA/rounding-10           9.20k ± 0%     7.59k ± 0%  -17.52%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10         87.0 ± 0%      72.0 ± 0%  -17.24%  (p=0.000 n=10+10)
Exp/P10/S-4/D10-10         95.0 ± 0%      79.0 ± 0%  -16.84%  (p=0.000 n=10+10)
GDA/randoms-10            40.4k ± 0%     33.6k ± 0%  -16.82%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10           104 ± 0%        87 ± 0%  -16.35%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10            105 ± 0%        88 ± 0%  -16.19%  (p=0.000 n=10+10)
Exp/P10/S-4/D-10-10        93.0 ± 0%      78.0 ± 0%  -16.13%  (p=0.000 n=10+10)
GDA/squareroot-10          275k ± 0%      230k ± 0%  -16.08%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10         174 ± 0%       147 ± 0%  -15.52%  (p=0.000 n=10+10)
GDA/plus-10                 587 ± 0%       496 ± 0%  -15.50%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10          163 ± 0%       138 ± 0%  -15.34%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10           166 ± 0%       141 ± 0%  -15.06%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10          182 ± 0%       155 ± 0%  -14.84%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10          335 ± 0%       286 ± 0%  -14.63%  (p=0.000 n=10+10)
Ln/P2/S-2/D2-10             342 ± 0%       292 ± 0%  -14.62%  (p=0.000 n=10+10)
Ln/P2/S2/D2-10              363 ± 0%       310 ± 0%  -14.60%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10             366 ± 0%       313 ± 0%  -14.48%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10           335 ± 0%       287 ± 0%  -14.33%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       7.67k ± 0%     6.58k ± 0%  -14.27%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10            582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10             582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S10/D10-10           541 ± 0%       464 ± 0%  -14.23%  (p=0.000 n=10+10)
Ln/P10/S2/D10-10            563 ± 0%       483 ± 0%  -14.21%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10            317 ± 0%       272 ± 0%  -14.20%  (p=0.000 n=10+10)
GDA/powersqrt-10          2.63M ± 0%     2.26M ± 0%  -14.17%  (p=0.000 n=10+10)
Ln/P10/S-2/D10-10           637 ± 0%       547 ± 0%  -14.13%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10           581 ± 0%       499 ± 0%  -14.11%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10           369 ± 0%       317 ± 0%  -14.09%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10            342 ± 0%       294 ± 0%  -14.04%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10            613 ± 0%       527 ± 0%  -14.03%  (p=0.000 n=9+10)
Ln/P10/S-10/D10-10          581 ± 0%       500 ± 0%  -13.99%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10            272 ± 0%       234 ± 0%  -13.97%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10          575 ± 0%       495 ± 0%  -13.91%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10           597 ± 0%       514 ± 0%  -13.90%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10          504 ± 0%       434 ± 0%  -13.89%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10             281 ± 0%       242 ± 0%  -13.88%  (p=0.000 n=10+10)
Ln/P10/S-100/D10-10         549 ± 0%       473 ± 0%  -13.84%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10           336 ± 0%       290 ± 0%  -13.69%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10            376 ± 0%       325 ± 0%  -13.56%  (p=0.000 n=10+10)
GDA/ln-10                  269k ± 0%      239k ± 0%  -10.88%  (p=0.000 n=10+10)
GDA/log10-10               326k ± 0%      292k ± 0%  -10.44%  (p=0.000 n=10+10)
GDA/power-10               699k ± 0%      627k ± 0%  -10.30%  (p=0.000 n=10+10)
GDA/add-10                13.2k ± 0%     11.9k ± 0%   -9.68%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10         701 ± 0%       638 ± 0%   -8.99%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10        715 ± 0%       651 ± 0%   -8.95%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10          686 ± 0%       626 ± 0%   -8.75%  (p=0.000 n=10+10)
Exp/P100/S-4/D-2-10         723 ± 0%       661 ± 0%   -8.58%  (p=0.000 n=10+10)
Exp/P100/S-1/D10-10       1.36k ± 0%     1.26k ± 0%   -7.99%  (p=0.000 n=10+10)
Exp/P100/S-1/D2-10        1.35k ± 0%     1.24k ± 0%   -7.72%  (p=0.000 n=9+10)
Exp/P100/S-1/D-10-10      1.40k ± 0%     1.29k ± 0%   -7.70%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10       1.43k ± 0%     1.32k ± 0%   -7.48%  (p=0.000 n=9+10)
Exp/P100/S2/D-2-10        2.03k ± 0%     1.89k ± 0%   -6.82%  (p=0.000 n=9+10)
Exp/P100/S2/D10-10        2.28k ± 0%     2.13k ± 0%   -6.81%  (p=0.000 n=10+9)
Exp/P100/S2/D-10-10       2.19k ± 0%     2.05k ± 0%   -6.66%  (p=0.001 n=8+9)
Exp/P100/S2/D2-10         2.21k ± 0%     2.06k ± 0%   -6.61%  (p=0.000 n=9+10)
Ln/P100/S-10/D2-10        6.00k ± 0%     5.65k ± 0%   -5.87%  (p=0.000 n=10+9)
Ln/P100/S100/D2-10        6.25k ± 0%     5.89k ± 0%   -5.80%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         5.79k ± 0%     5.46k ± 0%   -5.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      6.04k ± 0%     5.69k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         6.21k ± 0%     5.86k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        6.09k ± 0%     5.74k ± 0%   -5.73%  (p=0.000 n=10+8)
Exp/P100/S-4/D100-10        855 ± 0%       806 ± 0%   -5.73%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       5.62k ± 0%     5.30k ± 0%   -5.72%  (p=0.000 n=10+8)
Ln/P100/S-100/D2-10       6.34k ± 0%     5.98k ± 0%   -5.68%  (p=0.000 n=10+10)
Ln/P100/S10/D100-10       5.73k ± 0%     5.41k ± 0%   -5.67%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       6.08k ± 0%     5.74k ± 0%   -5.66%  (p=0.000 n=9+8)
Ln/P100/S10/D2-10         6.13k ± 0%     5.78k ± 0%   -5.66%  (p=0.000 n=8+10)
Ln/P100/S-10/D100-10      5.69k ± 0%     5.37k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S10/D10-10        5.74k ± 0%     5.41k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S-100/D10-10      5.55k ± 0%     5.23k ± 0%   -5.63%  (p=0.000 n=10+7)
Ln/P100/S2/D2-10          5.86k ± 0%     5.53k ± 0%   -5.62%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10       893 ± 0%       843 ± 0%   -5.60%  (p=0.000 n=10+10)
Ln/P100/S-100/D100-10     5.73k ± 0%     5.41k ± 0%   -5.54%  (p=0.000 n=10+10)
Exp/P100/S-1/D100-10      1.68k ± 0%     1.58k ± 0%   -5.52%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        5.40k ± 0%     5.10k ± 0%   -5.49%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       5.77k ± 0%     5.45k ± 0%   -5.49%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10     1.68k ± 0%     1.59k ± 0%   -5.41%  (p=0.000 n=10+10)
Exp/P100/S2/D100-10       2.61k ± 0%     2.48k ± 0%   -5.03%  (p=0.000 n=8+10)
Exp/P100/S2/D-100-10      2.47k ± 0%     2.35k ± 0%   -4.90%  (p=0.000 n=10+10)
GDA/multiply-10             931 ± 0%       889 ± 0%   -4.51%  (p=0.000 n=10+10)
GDA/exp-10                 911k ± 0%      895k ± 0%   -1.75%  (p=0.000 n=10+10)
GDA/base-10               2.11k ± 0%     2.11k ± 0%     ~     (all equal)
GDA/comparetotal-10         254 ± 0%       254 ± 0%     ~     (all equal)

cc. @mjibson

nvb added 4 commits January 1, 2022 09:45
They cause the benchmarks to run for a very long time.
See golang/go#27217.

Adjust the benchmarks to have an explicit setup phase
and run phase, separated by `b.ResetTimer`.
This reduces the size of the Decimal struct from 48 bytes to 40 bytes.
This commit introduces a performance optimization that embeds small
coefficient values directly in their `Decimal` struct, instead of
storing these values in a separate heap allocation.

Each `Decimal` maintains (through `big.Int`) an internal reference to a
variable-length coefficient, which is represented by a `[]big.Word`. The
`coeffInline` field and the `lazyInit` method combine to allow us to
inline this coefficient within the `Decimal` struct when its value is
sufficiently small. In `lazyInit`, we point the `Coeff` field's backing
slice at the `coeffInline` array. `big.Int` will avoid re-allocating
this array until it is provided with a value that exceeds the initial
capacity. We set this initial capacity to accommodate any coefficient
that would fit in a 64-bit integer (i.e. values up to 2^64).

This is an alternative to an optimization that many other arbitrary
precision decimal libraries have where small coefficient values are
stored as numeric fields in their data type's struct. Only when this
coefficient value gets large enough do these libraries fall back to a
variable-length coefficient with internal indirection. We can see the
optimization in practice in the `ericlagergren/decimal` library, where
each struct contains a `compact uint64` and an `unscaled big.Int`. Prior
concern from the authors of `cockroachdb/apd` regarding this form of
compact representation optimization was that it traded performance for
complexity. The optimization fractured control flow, leaking out across
the library and leading to more complex, error-prone code.

The approach taken in this commit does not have the same issue. All
arithmetic on the decimal's coefficient is still deferred to `bit.Int`.
In fact, the entire optimization is best-effort, and bugs that lead to
missed calls to `lazyInit` are merely missed opportunities to avoid a
heap allocation, and nothing more serious. The internal reference within
Decimal is vulnerable to memory aliasing bugs when a `Decimal` is copied
by value (and not through the `Decimal.Set` method), but this is not a
new concern.

Impact on benchmarks:

```
name                   old time/op    new time/op    delta
GDA/compare-10           61.5µs ±20%    54.5µs ±18%  -11.34%  (p=0.035 n=10+10)
GDA/reduce-10            18.8µs ± 7%    17.5µs ± 6%   -7.16%  (p=0.001 n=10+10)
GDA/remainder-10         67.0µs ±11%    62.2µs ± 6%   -7.09%  (p=0.023 n=10+10)
Exp/P5/S-4/D-2-10        2.56µs ± 4%    2.51µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-4/D2-10         2.65µs ± 4%    2.60µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D-2-10        6.44µs ± 5%    6.31µs ± 4%     ~     (p=0.060 n=10+10)
Exp/P5/S-1/D2-10         6.27µs ± 4%    6.15µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D-2-10         19.3µs ± 4%    19.1µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P5/S2/D2-10          19.1µs ± 4%    18.9µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-10-10      10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.063 n=10+10)
Exp/P10/S-4/D-2-10       5.59µs ± 4%    5.49µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D2-10        4.68µs ± 4%    4.62µs ± 5%     ~     (p=0.138 n=10+10)
Exp/P10/S-4/D10-10       10.5µs ± 4%    10.4µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-10-10      22.2µs ± 4%    22.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D-2-10       12.2µs ± 4%    12.1µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D2-10        12.2µs ± 4%    12.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P10/S-1/D10-10       21.8µs ± 4%    21.7µs ± 4%     ~     (p=0.247 n=10+10)
Exp/P10/S2/D-10-10       44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.165 n=10+10)
Exp/P10/S2/D-2-10        31.8µs ± 4%    31.7µs ± 4%     ~     (p=0.190 n=10+10)
Exp/P10/S2/D2-10         28.7µs ± 4%    28.6µs ± 4%     ~     (p=0.139 n=10+10)
Exp/P10/S2/D10-10        44.0µs ± 4%    44.0µs ± 4%     ~     (p=0.143 n=10+10)
Exp/P100/S-4/D-100-10     561µs ± 4%     565µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S-4/D-10-10      297µs ± 4%     299µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-4/D-2-10       286µs ± 4%     289µs ± 4%     ~     (p=0.315 n=10+10)
Exp/P100/S-4/D2-10        287µs ± 4%     289µs ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S-4/D10-10       292µs ± 4%     294µs ± 4%     ~     (p=0.529 n=10+10)
Exp/P100/S-4/D100-10      550µs ± 4%     554µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-100-10    1.05ms ± 4%    1.05ms ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D-10-10      637µs ± 4%     641µs ± 4%     ~     (p=0.739 n=10+10)
Exp/P100/S-1/D-2-10       636µs ± 4%     641µs ± 4%     ~     (p=0.436 n=10+10)
Exp/P100/S-1/D2-10        612µs ± 4%     616µs ± 4%     ~     (p=0.631 n=10+10)
Exp/P100/S-1/D10-10       621µs ± 4%     625µs ± 4%     ~     (p=0.912 n=10+10)
Exp/P100/S-1/D100-10     1.07ms ± 3%    1.08ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-100-10     1.54ms ± 3%    1.55ms ± 4%     ~     (p=0.684 n=10+10)
Exp/P100/S2/D-10-10      1.06ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D-2-10        918µs ± 4%     924µs ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D2-10        1.01ms ± 4%    1.02ms ± 4%     ~     (p=0.853 n=10+10)
Exp/P100/S2/D10-10       1.05ms ± 4%    1.06ms ± 4%     ~     (p=0.971 n=10+10)
Exp/P100/S2/D100-10      1.58ms ± 4%    1.59ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P2/S-100/D2-10        25.1µs ± 4%    25.0µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-10/D2-10         22.6µs ± 4%    22.6µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S-2/D2-10          24.9µs ± 4%    24.8µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S2/D2-10           28.1µs ± 4%    27.9µs ± 4%     ~     (p=0.139 n=10+10)
Ln/P2/S10/D2-10          27.0µs ± 4%    26.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P2/S100/D2-10         28.7µs ± 4%    28.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-100/D2-10       78.1µs ± 4%    78.1µs ± 4%     ~     (p=0.218 n=10+10)
Ln/P10/S-100/D10-10      74.3µs ± 4%    74.3µs ± 4%     ~     (p=0.190 n=10+10)
Ln/P10/S-10/D2-10        83.4µs ± 4%    83.4µs ± 4%     ~     (p=0.315 n=10+10)
Ln/P10/S-10/D10-10       80.9µs ± 4%    80.9µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S-2/D2-10         82.1µs ± 4%    82.2µs ± 5%     ~     (p=0.280 n=10+10)
Ln/P10/S-2/D10-10        90.6µs ± 4%    90.6µs ± 5%     ~     (p=0.190 n=10+10)
Ln/P10/S2/D2-10          80.6µs ± 4%    80.6µs ± 4%     ~     (p=0.184 n=10+10)
Ln/P10/S2/D10-10         79.7µs ± 4%    79.5µs ± 4%     ~     (p=0.143 n=10+10)
Ln/P10/S10/D2-10         85.2µs ± 4%    85.2µs ± 5%     ~     (p=0.218 n=10+10)
Ln/P10/S10/D10-10        74.8µs ± 4%    74.7µs ± 4%     ~     (p=0.247 n=10+10)
Ln/P10/S100/D2-10        83.8µs ± 4%    83.6µs ± 4%     ~     (p=0.165 n=10+10)
Ln/P10/S100/D10-10       66.1µs ± 4%    65.9µs ± 5%     ~     (p=0.143 n=10+10)
Ln/P100/S-100/D2-10      3.87ms ± 4%    3.90ms ± 5%     ~     (p=0.853 n=10+10)
Ln/P100/S-100/D10-10     3.34ms ± 4%    3.35ms ± 5%     ~     (p=0.529 n=10+10)
Ln/P100/S-100/D100-10    3.46ms ± 4%    3.48ms ± 5%     ~     (p=0.971 n=10+10)
Ln/P100/S-10/D2-10       3.63ms ± 4%    3.65ms ± 4%     ~     (p=0.739 n=10+10)
Ln/P100/S-10/D10-10      3.43ms ± 4%    3.44ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S-10/D100-10     3.42ms ± 4%    3.45ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D2-10        3.51ms ± 4%    3.53ms ± 4%     ~     (p=0.853 n=10+10)
Ln/P100/S-2/D10-10       3.21ms ± 4%    3.24ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S-2/D100-10      3.47ms ± 4%    3.50ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S2/D2-10         3.56ms ± 4%    3.59ms ± 4%     ~     (p=0.796 n=10+10)
Ln/P100/S2/D10-10        3.82ms ± 4%    3.84ms ± 4%     ~     (p=1.000 n=10+10)
Ln/P100/S2/D100-10       3.72ms ± 4%    3.75ms ± 5%     ~     (p=0.353 n=10+10)
Ln/P100/S10/D2-10        3.78ms ± 4%    3.81ms ± 4%     ~     (p=0.631 n=10+10)
Ln/P100/S10/D10-10       3.50ms ± 4%    3.52ms ± 4%     ~     (p=0.529 n=10+10)
Ln/P100/S10/D100-10      3.49ms ± 4%    3.52ms ± 4%     ~     (p=0.684 n=10+10)
Ln/P100/S100/D2-10       3.82ms ± 4%    3.85ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D10-10      3.72ms ± 4%    3.75ms ± 4%     ~     (p=0.393 n=10+10)
Ln/P100/S100/D100-10     3.65ms ± 4%    3.68ms ± 4%     ~     (p=0.315 n=10+10)
GDA/abs-10               10.4µs ± 6%    10.3µs ± 6%     ~     (p=0.564 n=10+10)
GDA/base-10               110µs ± 4%     111µs ± 5%     ~     (p=0.869 n=10+10)
GDA/comparetotal-10      30.2µs ± 5%    30.3µs ± 5%     ~     (p=0.631 n=10+10)
GDA/divide-10             391µs ± 7%     390µs ± 3%     ~     (p=0.853 n=10+10)
GDA/exp-10                127ms ± 4%     127ms ± 4%     ~     (p=0.796 n=10+10)
GDA/ln-10                83.6ms ± 3%    83.6ms ± 4%     ~     (p=0.218 n=10+10)
GDA/log10-10              106ms ± 4%     105ms ± 4%     ~     (p=0.218 n=10+10)
GDA/minus-10             11.7µs ± 4%    11.7µs ± 7%     ~     (p=0.579 n=10+10)
GDA/multiply-10          77.3µs ± 9%    80.8µs ± 9%     ~     (p=0.089 n=10+10)
GDA/plus-10              42.9µs ± 5%    44.2µs ± 4%     ~     (p=0.052 n=10+10)
GDA/power-10              218ms ± 3%     218ms ± 3%     ~     (p=0.631 n=10+10)
GDA/powersqrt-10          450ms ± 2%     448ms ± 2%     ~     (p=0.280 n=10+10)
GDA/quantize-10           143µs ±12%     128µs ± 8%     ~     (p=0.052 n=10+10)
GDA/randoms-10           3.05ms ± 5%    3.02ms ± 5%     ~     (p=0.143 n=10+10)
GDA/rounding-10           659µs ± 4%     658µs ± 4%     ~     (p=0.579 n=10+10)
GDA/squareroot-10        33.1ms ± 3%    32.7ms ± 3%     ~     (p=0.143 n=10+10)
GDA/subtract-10           172µs ±11%     170µs ± 8%     ~     (p=0.912 n=10+10)
GDA/tointegral-10        32.9µs ± 6%    32.2µs ± 5%     ~     (p=0.190 n=10+10)
GDA/tointegralx-10       33.6µs ± 6%    32.7µs ± 4%     ~     (p=0.190 n=10+10)
GDA/cuberoot-apd-10      2.08ms ± 3%    2.08ms ± 4%     ~     (p=0.165 n=10+10)
GDA/add-10                904µs ± 5%     919µs ± 5%   +1.65%  (p=0.043 n=10+10)
GDA/divideint-10         28.7µs ± 4%    29.7µs ± 1%   +3.51%  (p=0.026 n=9+6)

name                   old alloc/op   new alloc/op   delta
GDA/compare-10           10.1kB ± 0%     7.0kB ± 0%  -30.72%  (p=0.000 n=10+10)
GDA/quantize-10          42.3kB ± 0%    36.5kB ± 0%  -13.63%  (p=0.000 n=9+10)
GDA/divideint-10         6.51kB ± 0%    5.70kB ± 0%  -12.41%  (p=0.000 n=10+10)
GDA/remainder-10         22.4kB ± 0%    20.0kB ± 0%  -10.60%  (p=0.000 n=10+10)
GDA/tointegralx-10       19.2kB ± 0%    17.4kB ± 0%   -9.28%  (p=0.000 n=10+10)
GDA/tointegral-10        19.1kB ± 0%    17.4kB ± 0%   -8.83%  (p=0.000 n=10+10)
GDA/reduce-10            2.22kB ± 0%    2.02kB ± 0%   -8.66%  (p=0.000 n=10+10)
GDA/divide-10            74.3kB ± 0%    70.1kB ± 0%   -5.62%  (p=0.000 n=10+10)
GDA/squareroot-10        8.30MB ± 0%    7.96MB ± 0%   -4.08%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10        1.23kB ± 0%    1.18kB ± 0%   -3.92%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10       4.50kB ± 0%    4.33kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10        2.83kB ± 0%    2.72kB ± 0%   -3.75%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10         1.29kB ± 0%    1.24kB ± 0%   -3.72%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10        2.02kB ± 0%    1.94kB ± 0%   -3.71%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10        9.59kB ± 0%    9.24kB ± 0%   -3.66%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10         7.53kB ± 0%    7.26kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10          7.76kB ± 0%    7.48kB ± 0%   -3.65%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10         9.75kB ± 0%    9.40kB ± 0%   -3.60%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10       2.32kB ± 0%    2.24kB ± 0%   -3.58%  (p=0.000 n=10+10)
GDA/rounding-10           246kB ± 0%     237kB ± 0%   -3.56%  (p=0.000 n=10+10)
GDA/randoms-10           1.16MB ± 0%    1.12MB ± 0%   -3.55%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10       9.79kB ± 0%    9.44kB ± 0%   -3.54%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10        10.7kB ± 0%    10.3kB ± 0%   -3.53%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10         2.87kB ± 0%    2.77kB ± 0%   -3.52%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10      4.95kB ± 0%    4.78kB ± 0%   -3.47%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10        4.64kB ± 0%    4.49kB ± 0%   -3.40%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       261kB ± 0%     252kB ± 0%   -3.36%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10       5.17kB ± 0%    5.00kB ± 0%   -3.33%  (p=0.000 n=10+10)
GDA/powersqrt-10         77.6MB ± 0%    75.1MB ± 0%   -3.29%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10         17.1kB ± 0%    16.5kB ± 0%   -3.26%  (p=0.000 n=9+10)
Ln/P10/S-2/D10-10        18.6kB ± 0%    18.0kB ± 0%   -3.24%  (p=0.000 n=9+8)
Exp/P10/S-4/D-10-10      2.56kB ± 0%    2.48kB ± 0%   -3.24%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10          17.0kB ± 0%    16.5kB ± 0%   -3.23%  (p=0.000 n=7+10)
Ln/P10/S10/D10-10        15.8kB ± 0%    15.3kB ± 0%   -3.23%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10         18.0kB ± 0%    17.4kB ± 0%   -3.22%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10        17.0kB ± 0%    16.4kB ± 0%   -3.22%  (p=0.000 n=8+10)
Ln/P10/S2/D10-10         16.5kB ± 0%    16.0kB ± 0%   -3.21%  (p=0.000 n=10+10)
Ln/P10/S-10/D10-10       17.1kB ± 0%    16.5kB ± 0%   -3.18%  (p=0.000 n=10+7)
Exp/P10/S-4/D10-10       2.61kB ± 0%    2.52kB ± 0%   -3.18%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10          10.3kB ± 0%     9.9kB ± 0%   -3.15%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10       17.1kB ± 0%    16.6kB ± 0%   -3.11%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10        17.8kB ± 0%    17.2kB ± 0%   -3.11%  (p=0.000 n=8+10)
Ln/P2/S2/D2-10           10.3kB ± 0%     9.9kB ± 0%   -3.10%  (p=0.000 n=10+9)
Ln/P2/S-2/D2-10          9.62kB ± 0%    9.32kB ± 0%   -3.09%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10       15.0kB ± 0%    14.5kB ± 0%   -3.09%  (p=0.000 n=9+10)
Ln/P10/S-100/D10-10      16.3kB ± 0%    15.8kB ± 0%   -3.08%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10         8.91kB ± 0%    8.64kB ± 0%   -3.07%  (p=0.000 n=10+9)
GDA/subtract-10           100kB ± 0%      97kB ± 0%   -2.89%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10         10.9kB ± 0%    10.6kB ± 0%   -2.86%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10        9.81kB ± 0%    9.53kB ± 0%   -2.83%  (p=0.000 n=10+10)
GDA/ln-10                10.2MB ± 0%    10.0MB ± 0%   -2.09%  (p=0.000 n=10+10)
GDA/log10-10             12.5MB ± 0%    12.3MB ± 0%   -1.87%  (p=0.000 n=10+10)
GDA/power-10             27.0MB ± 0%    26.5MB ± 0%   -1.78%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10      31.9kB ± 0%    31.4kB ± 0%   -1.46%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10     33.5kB ± 0%    33.0kB ± 0%   -1.41%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10       31.8kB ± 0%    31.4kB ± 0%   -1.40%  (p=0.000 n=10+9)
Exp/P100/S-4/D-2-10      34.6kB ± 0%    34.1kB ± 0%   -1.36%  (p=0.000 n=10+9)
Exp/P100/S-1/D10-10      65.0kB ± 0%    64.2kB ± 0%   -1.27%  (p=0.000 n=8+10)
Exp/P100/S-1/D2-10       64.9kB ± 0%    64.1kB ± 0%   -1.24%  (p=0.000 n=10+10)
Exp/P100/S-1/D-10-10     70.1kB ± 0%    69.3kB ± 0%   -1.19%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10      72.6kB ± 0%    71.8kB ± 0%   -1.14%  (p=0.000 n=9+9)
Exp/P100/S2/D10-10        115kB ± 0%     114kB ± 0%   -1.05%  (p=0.000 n=10+10)
Exp/P100/S2/D2-10         113kB ± 0%     112kB ± 0%   -1.01%  (p=0.000 n=10+10)
Exp/P100/S2/D-2-10        107kB ± 0%     106kB ± 0%   -1.00%  (p=0.000 n=10+10)
Exp/P100/S2/D-10-10       117kB ± 0%     116kB ± 0%   -0.96%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       318kB ± 0%     315kB ± 0%   -0.83%  (p=0.000 n=10+8)
Ln/P100/S-100/D100-10     328kB ± 0%     326kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S-10/D2-10        336kB ± 0%     334kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         326kB ± 0%     324kB ± 0%   -0.79%  (p=0.000 n=10+10)
Ln/P100/S100/D2-10        353kB ± 0%     350kB ± 0%   -0.79%  (p=0.000 n=10+9)
Ln/P100/S10/D10-10        326kB ± 0%     324kB ± 0%   -0.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      340kB ± 0%     338kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         351kB ± 0%     348kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S-10/D100-10      323kB ± 0%     320kB ± 0%   -0.77%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        344kB ± 0%     341kB ± 0%   -0.77%  (p=0.000 n=9+6)
Ln/P100/S-100/D10-10      311kB ± 0%     309kB ± 0%   -0.76%  (p=0.000 n=10+7)
Ln/P100/S10/D100-10       323kB ± 0%     321kB ± 0%   -0.76%  (p=0.000 n=8+9)
Ln/P100/S-100/D2-10       362kB ± 0%     360kB ± 0%   -0.75%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       345kB ± 0%     342kB ± 0%   -0.74%  (p=0.000 n=9+8)
Exp/P100/S-1/D100-10     94.0kB ± 0%    93.3kB ± 0%   -0.74%  (p=0.000 n=8+10)
Exp/P100/S-4/D100-10     47.9kB ± 0%    47.5kB ± 0%   -0.74%  (p=0.000 n=6+9)
Ln/P100/S2/D2-10          333kB ± 0%     330kB ± 0%   -0.74%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10    97.5kB ± 0%    96.8kB ± 0%   -0.70%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10    51.5kB ± 0%    51.2kB ± 0%   -0.70%  (p=0.000 n=10+10)
Ln/P100/S10/D2-10         346kB ± 0%     343kB ± 0%   -0.69%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        306kB ± 0%     304kB ± 0%   -0.68%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       329kB ± 0%     326kB ± 0%   -0.67%  (p=0.000 n=10+8)
Exp/P100/S2/D100-10       150kB ± 0%     149kB ± 0%   -0.67%  (p=0.000 n=10+10)
Exp/P100/S2/D-100-10      148kB ± 0%     147kB ± 0%   -0.62%  (p=0.000 n=9+10)
GDA/exp-10               61.3MB ± 0%    61.1MB ± 0%   -0.18%  (p=0.000 n=10+10)
GDA/base-10              24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
GDA/comparetotal-10      7.11kB ± 0%    7.11kB ± 0%     ~     (all equal)
GDA/add-10                712kB ± 0%     720kB ± 0%   +1.18%  (p=0.000 n=10+8)
GDA/plus-10              46.6kB ± 0%    47.5kB ± 0%   +2.04%  (p=0.000 n=10+9)
GDA/multiply-10          35.5kB ± 0%    39.2kB ± 0%  +10.32%  (p=0.000 n=10+10)
GDA/minus-10             2.43kB ± 0%    2.82kB ± 0%  +16.12%  (p=0.000 n=10+10)
GDA/abs-10               2.33kB ± 0%    2.82kB ± 0%  +21.31%  (p=0.000 n=10+10)

name                   old allocs/op  new allocs/op  delta
GDA/reduce-10               187 ± 0%        43 ± 0%  -77.01%  (p=0.000 n=10+10)
GDA/compare-10              638 ± 0%       250 ± 0%  -60.82%  (p=0.000 n=10+10)
GDA/minus-10                164 ± 0%        73 ± 0%  -55.49%  (p=0.000 n=10+10)
GDA/abs-10                  151 ± 0%        73 ± 0%  -51.66%  (p=0.000 n=10+10)
GDA/quantize-10           1.70k ± 0%     0.98k ± 0%  -42.30%  (p=0.000 n=10+10)
GDA/tointegralx-10          677 ± 0%       454 ± 0%  -32.94%  (p=0.000 n=10+10)
GDA/remainder-10            923 ± 0%       626 ± 0%  -32.18%  (p=0.000 n=10+10)
GDA/tointegral-10           665 ± 0%       454 ± 0%  -31.73%  (p=0.000 n=10+10)
GDA/divideint-10            357 ± 0%       256 ± 0%  -28.29%  (p=0.000 n=10+10)
GDA/subtract-10           2.85k ± 0%     2.14k ± 0%  -24.89%  (p=0.000 n=10+10)
GDA/divide-10             2.51k ± 0%     1.91k ± 0%  -24.01%  (p=0.000 n=10+10)
Exp/P5/S-4/D-2-10          47.0 ± 0%      37.0 ± 0%  -21.28%  (p=0.000 n=10+10)
Exp/P5/S-4/D2-10           50.0 ± 0%      40.0 ± 0%  -20.00%  (p=0.000 n=10+10)
Exp/P10/S-4/D2-10          76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+10)
GDA/rounding-10           9.20k ± 0%     7.59k ± 0%  -17.52%  (p=0.000 n=10+10)
Exp/P10/S-4/D-2-10         87.0 ± 0%      72.0 ± 0%  -17.24%  (p=0.000 n=10+10)
Exp/P10/S-4/D10-10         95.0 ± 0%      79.0 ± 0%  -16.84%  (p=0.000 n=10+10)
GDA/randoms-10            40.4k ± 0%     33.6k ± 0%  -16.82%  (p=0.000 n=10+10)
Exp/P5/S-1/D-2-10           104 ± 0%        87 ± 0%  -16.35%  (p=0.000 n=10+10)
Exp/P5/S-1/D2-10            105 ± 0%        88 ± 0%  -16.19%  (p=0.000 n=10+10)
Exp/P10/S-4/D-10-10        93.0 ± 0%      78.0 ± 0%  -16.13%  (p=0.000 n=10+10)
GDA/squareroot-10          275k ± 0%      230k ± 0%  -16.08%  (p=0.000 n=10+10)
Exp/P10/S-1/D-10-10         174 ± 0%       147 ± 0%  -15.52%  (p=0.000 n=10+10)
GDA/plus-10                 587 ± 0%       496 ± 0%  -15.50%  (p=0.000 n=10+10)
Exp/P10/S-1/D-2-10          163 ± 0%       138 ± 0%  -15.34%  (p=0.000 n=10+10)
Exp/P10/S-1/D2-10           166 ± 0%       141 ± 0%  -15.06%  (p=0.000 n=10+10)
Exp/P10/S-1/D10-10          182 ± 0%       155 ± 0%  -14.84%  (p=0.000 n=10+10)
Exp/P10/S2/D-10-10          335 ± 0%       286 ± 0%  -14.63%  (p=0.000 n=10+10)
Ln/P2/S-2/D2-10             342 ± 0%       292 ± 0%  -14.62%  (p=0.000 n=10+10)
Ln/P2/S2/D2-10              363 ± 0%       310 ± 0%  -14.60%  (p=0.000 n=10+10)
Ln/P2/S10/D2-10             366 ± 0%       313 ± 0%  -14.48%  (p=0.000 n=10+10)
Exp/P10/S2/D-2-10           335 ± 0%       287 ± 0%  -14.33%  (p=0.000 n=10+10)
GDA/cuberoot-apd-10       7.67k ± 0%     6.58k ± 0%  -14.27%  (p=0.000 n=10+10)
Ln/P10/S-2/D2-10            582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S2/D2-10             582 ± 0%       499 ± 0%  -14.26%  (p=0.000 n=10+10)
Ln/P10/S10/D10-10           541 ± 0%       464 ± 0%  -14.23%  (p=0.000 n=10+10)
Ln/P10/S2/D10-10            563 ± 0%       483 ± 0%  -14.21%  (p=0.000 n=10+10)
Ln/P2/S-10/D2-10            317 ± 0%       272 ± 0%  -14.20%  (p=0.000 n=10+10)
GDA/powersqrt-10          2.63M ± 0%     2.26M ± 0%  -14.17%  (p=0.000 n=10+10)
Ln/P10/S-2/D10-10           637 ± 0%       547 ± 0%  -14.13%  (p=0.000 n=10+10)
Ln/P10/S-10/D2-10           581 ± 0%       499 ± 0%  -14.11%  (p=0.000 n=10+10)
Exp/P10/S2/D10-10           369 ± 0%       317 ± 0%  -14.09%  (p=0.000 n=10+10)
Exp/P10/S2/D2-10            342 ± 0%       294 ± 0%  -14.04%  (p=0.000 n=10+10)
Ln/P10/S10/D2-10            613 ± 0%       527 ± 0%  -14.03%  (p=0.000 n=9+10)
Ln/P10/S-10/D10-10          581 ± 0%       500 ± 0%  -13.99%  (p=0.000 n=10+10)
Exp/P5/S2/D-2-10            272 ± 0%       234 ± 0%  -13.97%  (p=0.000 n=10+10)
Ln/P10/S-100/D2-10          575 ± 0%       495 ± 0%  -13.91%  (p=0.000 n=10+10)
Ln/P10/S100/D2-10           597 ± 0%       514 ± 0%  -13.90%  (p=0.000 n=10+10)
Ln/P10/S100/D10-10          504 ± 0%       434 ± 0%  -13.89%  (p=0.000 n=10+10)
Exp/P5/S2/D2-10             281 ± 0%       242 ± 0%  -13.88%  (p=0.000 n=10+10)
Ln/P10/S-100/D10-10         549 ± 0%       473 ± 0%  -13.84%  (p=0.000 n=10+10)
Ln/P2/S-100/D2-10           336 ± 0%       290 ± 0%  -13.69%  (p=0.000 n=10+10)
Ln/P2/S100/D2-10            376 ± 0%       325 ± 0%  -13.56%  (p=0.000 n=10+10)
GDA/ln-10                  269k ± 0%      239k ± 0%  -10.88%  (p=0.000 n=10+10)
GDA/log10-10               326k ± 0%      292k ± 0%  -10.44%  (p=0.000 n=10+10)
GDA/power-10               699k ± 0%      627k ± 0%  -10.30%  (p=0.000 n=10+10)
GDA/add-10                13.2k ± 0%     11.9k ± 0%   -9.68%  (p=0.000 n=10+10)
Exp/P100/S-4/D10-10         701 ± 0%       638 ± 0%   -8.99%  (p=0.000 n=10+10)
Exp/P100/S-4/D-10-10        715 ± 0%       651 ± 0%   -8.95%  (p=0.000 n=10+10)
Exp/P100/S-4/D2-10          686 ± 0%       626 ± 0%   -8.75%  (p=0.000 n=10+10)
Exp/P100/S-4/D-2-10         723 ± 0%       661 ± 0%   -8.58%  (p=0.000 n=10+10)
Exp/P100/S-1/D10-10       1.36k ± 0%     1.26k ± 0%   -7.99%  (p=0.000 n=10+10)
Exp/P100/S-1/D2-10        1.35k ± 0%     1.24k ± 0%   -7.72%  (p=0.000 n=9+10)
Exp/P100/S-1/D-10-10      1.40k ± 0%     1.29k ± 0%   -7.70%  (p=0.000 n=10+10)
Exp/P100/S-1/D-2-10       1.43k ± 0%     1.32k ± 0%   -7.48%  (p=0.000 n=9+10)
Exp/P100/S2/D-2-10        2.03k ± 0%     1.89k ± 0%   -6.82%  (p=0.000 n=9+10)
Exp/P100/S2/D10-10        2.28k ± 0%     2.13k ± 0%   -6.81%  (p=0.000 n=10+9)
Exp/P100/S2/D-10-10       2.19k ± 0%     2.05k ± 0%   -6.66%  (p=0.001 n=8+9)
Exp/P100/S2/D2-10         2.21k ± 0%     2.06k ± 0%   -6.61%  (p=0.000 n=9+10)
Ln/P100/S-10/D2-10        6.00k ± 0%     5.65k ± 0%   -5.87%  (p=0.000 n=10+9)
Ln/P100/S100/D2-10        6.25k ± 0%     5.89k ± 0%   -5.80%  (p=0.000 n=10+9)
Ln/P100/S-2/D2-10         5.79k ± 0%     5.46k ± 0%   -5.78%  (p=0.000 n=10+10)
Ln/P100/S100/D100-10      6.04k ± 0%     5.69k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D10-10         6.21k ± 0%     5.86k ± 0%   -5.76%  (p=0.000 n=10+10)
Ln/P100/S2/D100-10        6.09k ± 0%     5.74k ± 0%   -5.73%  (p=0.000 n=10+8)
Exp/P100/S-4/D100-10        855 ± 0%       806 ± 0%   -5.73%  (p=0.000 n=10+10)
Ln/P100/S-10/D10-10       5.62k ± 0%     5.30k ± 0%   -5.72%  (p=0.000 n=10+8)
Ln/P100/S-100/D2-10       6.34k ± 0%     5.98k ± 0%   -5.68%  (p=0.000 n=10+10)
Ln/P100/S10/D100-10       5.73k ± 0%     5.41k ± 0%   -5.67%  (p=0.000 n=10+10)
Ln/P100/S100/D10-10       6.08k ± 0%     5.74k ± 0%   -5.66%  (p=0.000 n=9+8)
Ln/P100/S10/D2-10         6.13k ± 0%     5.78k ± 0%   -5.66%  (p=0.000 n=8+10)
Ln/P100/S-10/D100-10      5.69k ± 0%     5.37k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S10/D10-10        5.74k ± 0%     5.41k ± 0%   -5.64%  (p=0.000 n=10+10)
Ln/P100/S-100/D10-10      5.55k ± 0%     5.23k ± 0%   -5.63%  (p=0.000 n=10+7)
Ln/P100/S2/D2-10          5.86k ± 0%     5.53k ± 0%   -5.62%  (p=0.000 n=10+10)
Exp/P100/S-4/D-100-10       893 ± 0%       843 ± 0%   -5.60%  (p=0.000 n=10+10)
Ln/P100/S-100/D100-10     5.73k ± 0%     5.41k ± 0%   -5.54%  (p=0.000 n=10+10)
Exp/P100/S-1/D100-10      1.68k ± 0%     1.58k ± 0%   -5.52%  (p=0.000 n=8+10)
Ln/P100/S-2/D10-10        5.40k ± 0%     5.10k ± 0%   -5.49%  (p=0.000 n=10+10)
Ln/P100/S-2/D100-10       5.77k ± 0%     5.45k ± 0%   -5.49%  (p=0.000 n=10+10)
Exp/P100/S-1/D-100-10     1.68k ± 0%     1.59k ± 0%   -5.41%  (p=0.000 n=10+10)
Exp/P100/S2/D100-10       2.61k ± 0%     2.48k ± 0%   -5.03%  (p=0.000 n=8+10)
Exp/P100/S2/D-100-10      2.47k ± 0%     2.35k ± 0%   -4.90%  (p=0.000 n=10+10)
GDA/multiply-10             931 ± 0%       889 ± 0%   -4.51%  (p=0.000 n=10+10)
GDA/exp-10                 911k ± 0%      895k ± 0%   -1.75%  (p=0.000 n=10+10)
GDA/base-10               2.11k ± 0%     2.11k ± 0%     ~     (all equal)
GDA/comparetotal-10         254 ± 0%       254 ± 0%     ~     (all equal)
```
@nvb nvb requested review from jordanlewis and yuzefovich January 1, 2022 15:20
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jan 1, 2022

I think we can simplify this even further by creating a wrapper around big.Int and pushing the inline value optimization into that wrapper. Something along the lines of:

const inlineWords = 1 // or maybe 2

type BigInt struct {
	inner  big.Int
	inline [inlineWords]big.Word
}

func (b *BigInt) lazyInit() {
	if b.inner.Bits() == nil {
		b.inline = [inlineWords]big.Word{}
		b.inner.SetBits(b.inline[:0])
	}
}

func (b *BigInt) Add(x, y *BigInt) *BigInt {
	b.lazyInit()
	b.inner.Add(&x.inner, &y.inner)
	return b
}

...

type Decimal struct {
	Form     Form
	Negative bool
	Exponent int32
	Coeff    BigInt
}

We could then re-implement the big.Int API on this wrapper and add calls to (*BigInt).lazyInit where appropriate. That would probably be more lines of code, but it would mostly be boilerplate and would keep the optimization confined to a small layer below Decimal and above big.Int.

This BigInt wrapper may then be useful in other contexts. For instance, it would provide the inline optimization to intermediate results in this library, which are currently allocation heavy (see benchmark results above). It may also be useful in CRDB code.

@nvb nvb changed the title apd: embed small coefficient values in Decimal struct [DNM] apd: embed small coefficient values in Decimal struct Jan 3, 2022
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jan 3, 2022

I'm closing this in favor of #102. It turns out that the self-referential pointer in this commit was leading to all stack-allocated Decimals escaping to the heap. In #102, we address this while also speeding up most intermediate computation in the library.

@nvb nvb closed this Jan 3, 2022
@nvb nvb deleted the nvanbenschoten/inlineCoeff branch January 7, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant