[DNM] apd: embed small coefficient values in Decimal struct#101
Closed
nvb wants to merge 4 commits intocockroachdb:masterfrom
Closed
[DNM] apd: embed small coefficient values in Decimal struct#101nvb wants to merge 4 commits intocockroachdb:masterfrom
nvb wants to merge 4 commits intocockroachdb:masterfrom
Conversation
They cause the benchmarks to run for a very long time. See golang/go#27217. Adjust the benchmarks to have an explicit setup phase and run phase, separated by `b.ResetTimer`.
Cleans up a bit of code.
This reduces the size of the Decimal struct from 48 bytes to 40 bytes.
This commit introduces a performance optimization that embeds small coefficient values directly in their `Decimal` struct, instead of storing these values in a separate heap allocation. Each `Decimal` maintains (through `big.Int`) an internal reference to a variable-length coefficient, which is represented by a `[]big.Word`. The `coeffInline` field and the `lazyInit` method combine to allow us to inline this coefficient within the `Decimal` struct when its value is sufficiently small. In `lazyInit`, we point the `Coeff` field's backing slice at the `coeffInline` array. `big.Int` will avoid re-allocating this array until it is provided with a value that exceeds the initial capacity. We set this initial capacity to accommodate any coefficient that would fit in a 64-bit integer (i.e. values up to 2^64). This is an alternative to an optimization that many other arbitrary precision decimal libraries have where small coefficient values are stored as numeric fields in their data type's struct. Only when this coefficient value gets large enough do these libraries fall back to a variable-length coefficient with internal indirection. We can see the optimization in practice in the `ericlagergren/decimal` library, where each struct contains a `compact uint64` and an `unscaled big.Int`. Prior concern from the authors of `cockroachdb/apd` regarding this form of compact representation optimization was that it traded performance for complexity. The optimization fractured control flow, leaking out across the library and leading to more complex, error-prone code. The approach taken in this commit does not have the same issue. All arithmetic on the decimal's coefficient is still deferred to `bit.Int`. In fact, the entire optimization is best-effort, and bugs that lead to missed calls to `lazyInit` are merely missed opportunities to avoid a heap allocation, and nothing more serious. The internal reference within Decimal is vulnerable to memory aliasing bugs when a `Decimal` is copied by value (and not through the `Decimal.Set` method), but this is not a new concern. Impact on benchmarks: ``` name old time/op new time/op delta GDA/compare-10 61.5µs ±20% 54.5µs ±18% -11.34% (p=0.035 n=10+10) GDA/reduce-10 18.8µs ± 7% 17.5µs ± 6% -7.16% (p=0.001 n=10+10) GDA/remainder-10 67.0µs ±11% 62.2µs ± 6% -7.09% (p=0.023 n=10+10) Exp/P5/S-4/D-2-10 2.56µs ± 4% 2.51µs ± 4% ~ (p=0.060 n=10+10) Exp/P5/S-4/D2-10 2.65µs ± 4% 2.60µs ± 4% ~ (p=0.060 n=10+10) Exp/P5/S-1/D-2-10 6.44µs ± 5% 6.31µs ± 4% ~ (p=0.060 n=10+10) Exp/P5/S-1/D2-10 6.27µs ± 4% 6.15µs ± 4% ~ (p=0.063 n=10+10) Exp/P5/S2/D-2-10 19.3µs ± 4% 19.1µs ± 4% ~ (p=0.063 n=10+10) Exp/P5/S2/D2-10 19.1µs ± 4% 18.9µs ± 4% ~ (p=0.063 n=10+10) Exp/P10/S-4/D-10-10 10.5µs ± 4% 10.4µs ± 4% ~ (p=0.063 n=10+10) Exp/P10/S-4/D-2-10 5.59µs ± 4% 5.49µs ± 5% ~ (p=0.138 n=10+10) Exp/P10/S-4/D2-10 4.68µs ± 4% 4.62µs ± 5% ~ (p=0.138 n=10+10) Exp/P10/S-4/D10-10 10.5µs ± 4% 10.4µs ± 4% ~ (p=0.143 n=10+10) Exp/P10/S-1/D-10-10 22.2µs ± 4% 22.1µs ± 4% ~ (p=0.143 n=10+10) Exp/P10/S-1/D-2-10 12.2µs ± 4% 12.1µs ± 4% ~ (p=0.143 n=10+10) Exp/P10/S-1/D2-10 12.2µs ± 4% 12.0µs ± 4% ~ (p=0.143 n=10+10) Exp/P10/S-1/D10-10 21.8µs ± 4% 21.7µs ± 4% ~ (p=0.247 n=10+10) Exp/P10/S2/D-10-10 44.0µs ± 4% 44.0µs ± 4% ~ (p=0.165 n=10+10) Exp/P10/S2/D-2-10 31.8µs ± 4% 31.7µs ± 4% ~ (p=0.190 n=10+10) Exp/P10/S2/D2-10 28.7µs ± 4% 28.6µs ± 4% ~ (p=0.139 n=10+10) Exp/P10/S2/D10-10 44.0µs ± 4% 44.0µs ± 4% ~ (p=0.143 n=10+10) Exp/P100/S-4/D-100-10 561µs ± 4% 565µs ± 4% ~ (p=0.971 n=10+10) Exp/P100/S-4/D-10-10 297µs ± 4% 299µs ± 4% ~ (p=0.912 n=10+10) Exp/P100/S-4/D-2-10 286µs ± 4% 289µs ± 4% ~ (p=0.315 n=10+10) Exp/P100/S-4/D2-10 287µs ± 4% 289µs ± 4% ~ (p=0.684 n=10+10) Exp/P100/S-4/D10-10 292µs ± 4% 294µs ± 4% ~ (p=0.529 n=10+10) Exp/P100/S-4/D100-10 550µs ± 4% 554µs ± 4% ~ (p=0.739 n=10+10) Exp/P100/S-1/D-100-10 1.05ms ± 4% 1.05ms ± 4% ~ (p=0.631 n=10+10) Exp/P100/S-1/D-10-10 637µs ± 4% 641µs ± 4% ~ (p=0.739 n=10+10) Exp/P100/S-1/D-2-10 636µs ± 4% 641µs ± 4% ~ (p=0.436 n=10+10) Exp/P100/S-1/D2-10 612µs ± 4% 616µs ± 4% ~ (p=0.631 n=10+10) Exp/P100/S-1/D10-10 621µs ± 4% 625µs ± 4% ~ (p=0.912 n=10+10) Exp/P100/S-1/D100-10 1.07ms ± 3% 1.08ms ± 4% ~ (p=0.971 n=10+10) Exp/P100/S2/D-100-10 1.54ms ± 3% 1.55ms ± 4% ~ (p=0.684 n=10+10) Exp/P100/S2/D-10-10 1.06ms ± 4% 1.06ms ± 4% ~ (p=0.971 n=10+10) Exp/P100/S2/D-2-10 918µs ± 4% 924µs ± 4% ~ (p=0.971 n=10+10) Exp/P100/S2/D2-10 1.01ms ± 4% 1.02ms ± 4% ~ (p=0.853 n=10+10) Exp/P100/S2/D10-10 1.05ms ± 4% 1.06ms ± 4% ~ (p=0.971 n=10+10) Exp/P100/S2/D100-10 1.58ms ± 4% 1.59ms ± 4% ~ (p=0.853 n=10+10) Ln/P2/S-100/D2-10 25.1µs ± 4% 25.0µs ± 4% ~ (p=0.143 n=10+10) Ln/P2/S-10/D2-10 22.6µs ± 4% 22.6µs ± 4% ~ (p=0.143 n=10+10) Ln/P2/S-2/D2-10 24.9µs ± 4% 24.8µs ± 4% ~ (p=0.143 n=10+10) Ln/P2/S2/D2-10 28.1µs ± 4% 27.9µs ± 4% ~ (p=0.139 n=10+10) Ln/P2/S10/D2-10 27.0µs ± 4% 26.9µs ± 4% ~ (p=0.143 n=10+10) Ln/P2/S100/D2-10 28.7µs ± 4% 28.5µs ± 4% ~ (p=0.143 n=10+10) Ln/P10/S-100/D2-10 78.1µs ± 4% 78.1µs ± 4% ~ (p=0.218 n=10+10) Ln/P10/S-100/D10-10 74.3µs ± 4% 74.3µs ± 4% ~ (p=0.190 n=10+10) Ln/P10/S-10/D2-10 83.4µs ± 4% 83.4µs ± 4% ~ (p=0.315 n=10+10) Ln/P10/S-10/D10-10 80.9µs ± 4% 80.9µs ± 4% ~ (p=0.143 n=10+10) Ln/P10/S-2/D2-10 82.1µs ± 4% 82.2µs ± 5% ~ (p=0.280 n=10+10) Ln/P10/S-2/D10-10 90.6µs ± 4% 90.6µs ± 5% ~ (p=0.190 n=10+10) Ln/P10/S2/D2-10 80.6µs ± 4% 80.6µs ± 4% ~ (p=0.184 n=10+10) Ln/P10/S2/D10-10 79.7µs ± 4% 79.5µs ± 4% ~ (p=0.143 n=10+10) Ln/P10/S10/D2-10 85.2µs ± 4% 85.2µs ± 5% ~ (p=0.218 n=10+10) Ln/P10/S10/D10-10 74.8µs ± 4% 74.7µs ± 4% ~ (p=0.247 n=10+10) Ln/P10/S100/D2-10 83.8µs ± 4% 83.6µs ± 4% ~ (p=0.165 n=10+10) Ln/P10/S100/D10-10 66.1µs ± 4% 65.9µs ± 5% ~ (p=0.143 n=10+10) Ln/P100/S-100/D2-10 3.87ms ± 4% 3.90ms ± 5% ~ (p=0.853 n=10+10) Ln/P100/S-100/D10-10 3.34ms ± 4% 3.35ms ± 5% ~ (p=0.529 n=10+10) Ln/P100/S-100/D100-10 3.46ms ± 4% 3.48ms ± 5% ~ (p=0.971 n=10+10) Ln/P100/S-10/D2-10 3.63ms ± 4% 3.65ms ± 4% ~ (p=0.739 n=10+10) Ln/P100/S-10/D10-10 3.43ms ± 4% 3.44ms ± 4% ~ (p=0.796 n=10+10) Ln/P100/S-10/D100-10 3.42ms ± 4% 3.45ms ± 4% ~ (p=0.631 n=10+10) Ln/P100/S-2/D2-10 3.51ms ± 4% 3.53ms ± 4% ~ (p=0.853 n=10+10) Ln/P100/S-2/D10-10 3.21ms ± 4% 3.24ms ± 4% ~ (p=0.631 n=10+10) Ln/P100/S-2/D100-10 3.47ms ± 4% 3.50ms ± 4% ~ (p=0.631 n=10+10) Ln/P100/S2/D2-10 3.56ms ± 4% 3.59ms ± 4% ~ (p=0.796 n=10+10) Ln/P100/S2/D10-10 3.82ms ± 4% 3.84ms ± 4% ~ (p=1.000 n=10+10) Ln/P100/S2/D100-10 3.72ms ± 4% 3.75ms ± 5% ~ (p=0.353 n=10+10) Ln/P100/S10/D2-10 3.78ms ± 4% 3.81ms ± 4% ~ (p=0.631 n=10+10) Ln/P100/S10/D10-10 3.50ms ± 4% 3.52ms ± 4% ~ (p=0.529 n=10+10) Ln/P100/S10/D100-10 3.49ms ± 4% 3.52ms ± 4% ~ (p=0.684 n=10+10) Ln/P100/S100/D2-10 3.82ms ± 4% 3.85ms ± 4% ~ (p=0.393 n=10+10) Ln/P100/S100/D10-10 3.72ms ± 4% 3.75ms ± 4% ~ (p=0.393 n=10+10) Ln/P100/S100/D100-10 3.65ms ± 4% 3.68ms ± 4% ~ (p=0.315 n=10+10) GDA/abs-10 10.4µs ± 6% 10.3µs ± 6% ~ (p=0.564 n=10+10) GDA/base-10 110µs ± 4% 111µs ± 5% ~ (p=0.869 n=10+10) GDA/comparetotal-10 30.2µs ± 5% 30.3µs ± 5% ~ (p=0.631 n=10+10) GDA/divide-10 391µs ± 7% 390µs ± 3% ~ (p=0.853 n=10+10) GDA/exp-10 127ms ± 4% 127ms ± 4% ~ (p=0.796 n=10+10) GDA/ln-10 83.6ms ± 3% 83.6ms ± 4% ~ (p=0.218 n=10+10) GDA/log10-10 106ms ± 4% 105ms ± 4% ~ (p=0.218 n=10+10) GDA/minus-10 11.7µs ± 4% 11.7µs ± 7% ~ (p=0.579 n=10+10) GDA/multiply-10 77.3µs ± 9% 80.8µs ± 9% ~ (p=0.089 n=10+10) GDA/plus-10 42.9µs ± 5% 44.2µs ± 4% ~ (p=0.052 n=10+10) GDA/power-10 218ms ± 3% 218ms ± 3% ~ (p=0.631 n=10+10) GDA/powersqrt-10 450ms ± 2% 448ms ± 2% ~ (p=0.280 n=10+10) GDA/quantize-10 143µs ±12% 128µs ± 8% ~ (p=0.052 n=10+10) GDA/randoms-10 3.05ms ± 5% 3.02ms ± 5% ~ (p=0.143 n=10+10) GDA/rounding-10 659µs ± 4% 658µs ± 4% ~ (p=0.579 n=10+10) GDA/squareroot-10 33.1ms ± 3% 32.7ms ± 3% ~ (p=0.143 n=10+10) GDA/subtract-10 172µs ±11% 170µs ± 8% ~ (p=0.912 n=10+10) GDA/tointegral-10 32.9µs ± 6% 32.2µs ± 5% ~ (p=0.190 n=10+10) GDA/tointegralx-10 33.6µs ± 6% 32.7µs ± 4% ~ (p=0.190 n=10+10) GDA/cuberoot-apd-10 2.08ms ± 3% 2.08ms ± 4% ~ (p=0.165 n=10+10) GDA/add-10 904µs ± 5% 919µs ± 5% +1.65% (p=0.043 n=10+10) GDA/divideint-10 28.7µs ± 4% 29.7µs ± 1% +3.51% (p=0.026 n=9+6) name old alloc/op new alloc/op delta GDA/compare-10 10.1kB ± 0% 7.0kB ± 0% -30.72% (p=0.000 n=10+10) GDA/quantize-10 42.3kB ± 0% 36.5kB ± 0% -13.63% (p=0.000 n=9+10) GDA/divideint-10 6.51kB ± 0% 5.70kB ± 0% -12.41% (p=0.000 n=10+10) GDA/remainder-10 22.4kB ± 0% 20.0kB ± 0% -10.60% (p=0.000 n=10+10) GDA/tointegralx-10 19.2kB ± 0% 17.4kB ± 0% -9.28% (p=0.000 n=10+10) GDA/tointegral-10 19.1kB ± 0% 17.4kB ± 0% -8.83% (p=0.000 n=10+10) GDA/reduce-10 2.22kB ± 0% 2.02kB ± 0% -8.66% (p=0.000 n=10+10) GDA/divide-10 74.3kB ± 0% 70.1kB ± 0% -5.62% (p=0.000 n=10+10) GDA/squareroot-10 8.30MB ± 0% 7.96MB ± 0% -4.08% (p=0.000 n=10+10) Exp/P5/S-4/D-2-10 1.23kB ± 0% 1.18kB ± 0% -3.92% (p=0.000 n=10+10) Exp/P10/S-1/D-2-10 4.50kB ± 0% 4.33kB ± 0% -3.75% (p=0.000 n=10+10) Exp/P5/S-1/D-2-10 2.83kB ± 0% 2.72kB ± 0% -3.75% (p=0.000 n=10+10) Exp/P5/S-4/D2-10 1.29kB ± 0% 1.24kB ± 0% -3.72% (p=0.000 n=10+10) Exp/P10/S-4/D2-10 2.02kB ± 0% 1.94kB ± 0% -3.71% (p=0.000 n=10+10) Exp/P10/S2/D-2-10 9.59kB ± 0% 9.24kB ± 0% -3.66% (p=0.000 n=10+10) Exp/P5/S2/D-2-10 7.53kB ± 0% 7.26kB ± 0% -3.65% (p=0.000 n=10+10) Exp/P5/S2/D2-10 7.76kB ± 0% 7.48kB ± 0% -3.65% (p=0.000 n=10+10) Exp/P10/S2/D2-10 9.75kB ± 0% 9.40kB ± 0% -3.60% (p=0.000 n=10+10) Exp/P10/S-4/D-2-10 2.32kB ± 0% 2.24kB ± 0% -3.58% (p=0.000 n=10+10) GDA/rounding-10 246kB ± 0% 237kB ± 0% -3.56% (p=0.000 n=10+10) GDA/randoms-10 1.16MB ± 0% 1.12MB ± 0% -3.55% (p=0.000 n=10+10) Exp/P10/S2/D-10-10 9.79kB ± 0% 9.44kB ± 0% -3.54% (p=0.000 n=10+10) Exp/P10/S2/D10-10 10.7kB ± 0% 10.3kB ± 0% -3.53% (p=0.000 n=10+10) Exp/P5/S-1/D2-10 2.87kB ± 0% 2.77kB ± 0% -3.52% (p=0.000 n=10+10) Exp/P10/S-1/D-10-10 4.95kB ± 0% 4.78kB ± 0% -3.47% (p=0.000 n=10+10) Exp/P10/S-1/D2-10 4.64kB ± 0% 4.49kB ± 0% -3.40% (p=0.000 n=10+10) GDA/cuberoot-apd-10 261kB ± 0% 252kB ± 0% -3.36% (p=0.000 n=10+10) Exp/P10/S-1/D10-10 5.17kB ± 0% 5.00kB ± 0% -3.33% (p=0.000 n=10+10) GDA/powersqrt-10 77.6MB ± 0% 75.1MB ± 0% -3.29% (p=0.000 n=10+10) Ln/P10/S-2/D2-10 17.1kB ± 0% 16.5kB ± 0% -3.26% (p=0.000 n=9+10) Ln/P10/S-2/D10-10 18.6kB ± 0% 18.0kB ± 0% -3.24% (p=0.000 n=9+8) Exp/P10/S-4/D-10-10 2.56kB ± 0% 2.48kB ± 0% -3.24% (p=0.000 n=10+10) Ln/P10/S2/D2-10 17.0kB ± 0% 16.5kB ± 0% -3.23% (p=0.000 n=7+10) Ln/P10/S10/D10-10 15.8kB ± 0% 15.3kB ± 0% -3.23% (p=0.000 n=10+10) Ln/P10/S10/D2-10 18.0kB ± 0% 17.4kB ± 0% -3.22% (p=0.000 n=10+10) Ln/P10/S-10/D2-10 17.0kB ± 0% 16.4kB ± 0% -3.22% (p=0.000 n=8+10) Ln/P10/S2/D10-10 16.5kB ± 0% 16.0kB ± 0% -3.21% (p=0.000 n=10+10) Ln/P10/S-10/D10-10 17.1kB ± 0% 16.5kB ± 0% -3.18% (p=0.000 n=10+7) Exp/P10/S-4/D10-10 2.61kB ± 0% 2.52kB ± 0% -3.18% (p=0.000 n=10+10) Ln/P2/S10/D2-10 10.3kB ± 0% 9.9kB ± 0% -3.15% (p=0.000 n=10+10) Ln/P10/S-100/D2-10 17.1kB ± 0% 16.6kB ± 0% -3.11% (p=0.000 n=10+10) Ln/P10/S100/D2-10 17.8kB ± 0% 17.2kB ± 0% -3.11% (p=0.000 n=8+10) Ln/P2/S2/D2-10 10.3kB ± 0% 9.9kB ± 0% -3.10% (p=0.000 n=10+9) Ln/P2/S-2/D2-10 9.62kB ± 0% 9.32kB ± 0% -3.09% (p=0.000 n=10+10) Ln/P10/S100/D10-10 15.0kB ± 0% 14.5kB ± 0% -3.09% (p=0.000 n=9+10) Ln/P10/S-100/D10-10 16.3kB ± 0% 15.8kB ± 0% -3.08% (p=0.000 n=10+10) Ln/P2/S-10/D2-10 8.91kB ± 0% 8.64kB ± 0% -3.07% (p=0.000 n=10+9) GDA/subtract-10 100kB ± 0% 97kB ± 0% -2.89% (p=0.000 n=10+10) Ln/P2/S100/D2-10 10.9kB ± 0% 10.6kB ± 0% -2.86% (p=0.000 n=10+10) Ln/P2/S-100/D2-10 9.81kB ± 0% 9.53kB ± 0% -2.83% (p=0.000 n=10+10) GDA/ln-10 10.2MB ± 0% 10.0MB ± 0% -2.09% (p=0.000 n=10+10) GDA/log10-10 12.5MB ± 0% 12.3MB ± 0% -1.87% (p=0.000 n=10+10) GDA/power-10 27.0MB ± 0% 26.5MB ± 0% -1.78% (p=0.000 n=10+10) Exp/P100/S-4/D10-10 31.9kB ± 0% 31.4kB ± 0% -1.46% (p=0.000 n=10+10) Exp/P100/S-4/D-10-10 33.5kB ± 0% 33.0kB ± 0% -1.41% (p=0.000 n=10+10) Exp/P100/S-4/D2-10 31.8kB ± 0% 31.4kB ± 0% -1.40% (p=0.000 n=10+9) Exp/P100/S-4/D-2-10 34.6kB ± 0% 34.1kB ± 0% -1.36% (p=0.000 n=10+9) Exp/P100/S-1/D10-10 65.0kB ± 0% 64.2kB ± 0% -1.27% (p=0.000 n=8+10) Exp/P100/S-1/D2-10 64.9kB ± 0% 64.1kB ± 0% -1.24% (p=0.000 n=10+10) Exp/P100/S-1/D-10-10 70.1kB ± 0% 69.3kB ± 0% -1.19% (p=0.000 n=10+10) Exp/P100/S-1/D-2-10 72.6kB ± 0% 71.8kB ± 0% -1.14% (p=0.000 n=9+9) Exp/P100/S2/D10-10 115kB ± 0% 114kB ± 0% -1.05% (p=0.000 n=10+10) Exp/P100/S2/D2-10 113kB ± 0% 112kB ± 0% -1.01% (p=0.000 n=10+10) Exp/P100/S2/D-2-10 107kB ± 0% 106kB ± 0% -1.00% (p=0.000 n=10+10) Exp/P100/S2/D-10-10 117kB ± 0% 116kB ± 0% -0.96% (p=0.000 n=10+10) Ln/P100/S-10/D10-10 318kB ± 0% 315kB ± 0% -0.83% (p=0.000 n=10+8) Ln/P100/S-100/D100-10 328kB ± 0% 326kB ± 0% -0.79% (p=0.000 n=10+10) Ln/P100/S-10/D2-10 336kB ± 0% 334kB ± 0% -0.79% (p=0.000 n=10+9) Ln/P100/S-2/D2-10 326kB ± 0% 324kB ± 0% -0.79% (p=0.000 n=10+10) Ln/P100/S100/D2-10 353kB ± 0% 350kB ± 0% -0.79% (p=0.000 n=10+9) Ln/P100/S10/D10-10 326kB ± 0% 324kB ± 0% -0.78% (p=0.000 n=10+10) Ln/P100/S100/D100-10 340kB ± 0% 338kB ± 0% -0.77% (p=0.000 n=10+10) Ln/P100/S2/D10-10 351kB ± 0% 348kB ± 0% -0.77% (p=0.000 n=10+10) Ln/P100/S-10/D100-10 323kB ± 0% 320kB ± 0% -0.77% (p=0.000 n=10+10) Ln/P100/S2/D100-10 344kB ± 0% 341kB ± 0% -0.77% (p=0.000 n=9+6) Ln/P100/S-100/D10-10 311kB ± 0% 309kB ± 0% -0.76% (p=0.000 n=10+7) Ln/P100/S10/D100-10 323kB ± 0% 321kB ± 0% -0.76% (p=0.000 n=8+9) Ln/P100/S-100/D2-10 362kB ± 0% 360kB ± 0% -0.75% (p=0.000 n=10+10) Ln/P100/S100/D10-10 345kB ± 0% 342kB ± 0% -0.74% (p=0.000 n=9+8) Exp/P100/S-1/D100-10 94.0kB ± 0% 93.3kB ± 0% -0.74% (p=0.000 n=8+10) Exp/P100/S-4/D100-10 47.9kB ± 0% 47.5kB ± 0% -0.74% (p=0.000 n=6+9) Ln/P100/S2/D2-10 333kB ± 0% 330kB ± 0% -0.74% (p=0.000 n=10+10) Exp/P100/S-1/D-100-10 97.5kB ± 0% 96.8kB ± 0% -0.70% (p=0.000 n=10+10) Exp/P100/S-4/D-100-10 51.5kB ± 0% 51.2kB ± 0% -0.70% (p=0.000 n=10+10) Ln/P100/S10/D2-10 346kB ± 0% 343kB ± 0% -0.69% (p=0.000 n=8+10) Ln/P100/S-2/D10-10 306kB ± 0% 304kB ± 0% -0.68% (p=0.000 n=10+10) Ln/P100/S-2/D100-10 329kB ± 0% 326kB ± 0% -0.67% (p=0.000 n=10+8) Exp/P100/S2/D100-10 150kB ± 0% 149kB ± 0% -0.67% (p=0.000 n=10+10) Exp/P100/S2/D-100-10 148kB ± 0% 147kB ± 0% -0.62% (p=0.000 n=9+10) GDA/exp-10 61.3MB ± 0% 61.1MB ± 0% -0.18% (p=0.000 n=10+10) GDA/base-10 24.4kB ± 0% 24.4kB ± 0% ~ (all equal) GDA/comparetotal-10 7.11kB ± 0% 7.11kB ± 0% ~ (all equal) GDA/add-10 712kB ± 0% 720kB ± 0% +1.18% (p=0.000 n=10+8) GDA/plus-10 46.6kB ± 0% 47.5kB ± 0% +2.04% (p=0.000 n=10+9) GDA/multiply-10 35.5kB ± 0% 39.2kB ± 0% +10.32% (p=0.000 n=10+10) GDA/minus-10 2.43kB ± 0% 2.82kB ± 0% +16.12% (p=0.000 n=10+10) GDA/abs-10 2.33kB ± 0% 2.82kB ± 0% +21.31% (p=0.000 n=10+10) name old allocs/op new allocs/op delta GDA/reduce-10 187 ± 0% 43 ± 0% -77.01% (p=0.000 n=10+10) GDA/compare-10 638 ± 0% 250 ± 0% -60.82% (p=0.000 n=10+10) GDA/minus-10 164 ± 0% 73 ± 0% -55.49% (p=0.000 n=10+10) GDA/abs-10 151 ± 0% 73 ± 0% -51.66% (p=0.000 n=10+10) GDA/quantize-10 1.70k ± 0% 0.98k ± 0% -42.30% (p=0.000 n=10+10) GDA/tointegralx-10 677 ± 0% 454 ± 0% -32.94% (p=0.000 n=10+10) GDA/remainder-10 923 ± 0% 626 ± 0% -32.18% (p=0.000 n=10+10) GDA/tointegral-10 665 ± 0% 454 ± 0% -31.73% (p=0.000 n=10+10) GDA/divideint-10 357 ± 0% 256 ± 0% -28.29% (p=0.000 n=10+10) GDA/subtract-10 2.85k ± 0% 2.14k ± 0% -24.89% (p=0.000 n=10+10) GDA/divide-10 2.51k ± 0% 1.91k ± 0% -24.01% (p=0.000 n=10+10) Exp/P5/S-4/D-2-10 47.0 ± 0% 37.0 ± 0% -21.28% (p=0.000 n=10+10) Exp/P5/S-4/D2-10 50.0 ± 0% 40.0 ± 0% -20.00% (p=0.000 n=10+10) Exp/P10/S-4/D2-10 76.0 ± 0% 62.0 ± 0% -18.42% (p=0.000 n=10+10) GDA/rounding-10 9.20k ± 0% 7.59k ± 0% -17.52% (p=0.000 n=10+10) Exp/P10/S-4/D-2-10 87.0 ± 0% 72.0 ± 0% -17.24% (p=0.000 n=10+10) Exp/P10/S-4/D10-10 95.0 ± 0% 79.0 ± 0% -16.84% (p=0.000 n=10+10) GDA/randoms-10 40.4k ± 0% 33.6k ± 0% -16.82% (p=0.000 n=10+10) Exp/P5/S-1/D-2-10 104 ± 0% 87 ± 0% -16.35% (p=0.000 n=10+10) Exp/P5/S-1/D2-10 105 ± 0% 88 ± 0% -16.19% (p=0.000 n=10+10) Exp/P10/S-4/D-10-10 93.0 ± 0% 78.0 ± 0% -16.13% (p=0.000 n=10+10) GDA/squareroot-10 275k ± 0% 230k ± 0% -16.08% (p=0.000 n=10+10) Exp/P10/S-1/D-10-10 174 ± 0% 147 ± 0% -15.52% (p=0.000 n=10+10) GDA/plus-10 587 ± 0% 496 ± 0% -15.50% (p=0.000 n=10+10) Exp/P10/S-1/D-2-10 163 ± 0% 138 ± 0% -15.34% (p=0.000 n=10+10) Exp/P10/S-1/D2-10 166 ± 0% 141 ± 0% -15.06% (p=0.000 n=10+10) Exp/P10/S-1/D10-10 182 ± 0% 155 ± 0% -14.84% (p=0.000 n=10+10) Exp/P10/S2/D-10-10 335 ± 0% 286 ± 0% -14.63% (p=0.000 n=10+10) Ln/P2/S-2/D2-10 342 ± 0% 292 ± 0% -14.62% (p=0.000 n=10+10) Ln/P2/S2/D2-10 363 ± 0% 310 ± 0% -14.60% (p=0.000 n=10+10) Ln/P2/S10/D2-10 366 ± 0% 313 ± 0% -14.48% (p=0.000 n=10+10) Exp/P10/S2/D-2-10 335 ± 0% 287 ± 0% -14.33% (p=0.000 n=10+10) GDA/cuberoot-apd-10 7.67k ± 0% 6.58k ± 0% -14.27% (p=0.000 n=10+10) Ln/P10/S-2/D2-10 582 ± 0% 499 ± 0% -14.26% (p=0.000 n=10+10) Ln/P10/S2/D2-10 582 ± 0% 499 ± 0% -14.26% (p=0.000 n=10+10) Ln/P10/S10/D10-10 541 ± 0% 464 ± 0% -14.23% (p=0.000 n=10+10) Ln/P10/S2/D10-10 563 ± 0% 483 ± 0% -14.21% (p=0.000 n=10+10) Ln/P2/S-10/D2-10 317 ± 0% 272 ± 0% -14.20% (p=0.000 n=10+10) GDA/powersqrt-10 2.63M ± 0% 2.26M ± 0% -14.17% (p=0.000 n=10+10) Ln/P10/S-2/D10-10 637 ± 0% 547 ± 0% -14.13% (p=0.000 n=10+10) Ln/P10/S-10/D2-10 581 ± 0% 499 ± 0% -14.11% (p=0.000 n=10+10) Exp/P10/S2/D10-10 369 ± 0% 317 ± 0% -14.09% (p=0.000 n=10+10) Exp/P10/S2/D2-10 342 ± 0% 294 ± 0% -14.04% (p=0.000 n=10+10) Ln/P10/S10/D2-10 613 ± 0% 527 ± 0% -14.03% (p=0.000 n=9+10) Ln/P10/S-10/D10-10 581 ± 0% 500 ± 0% -13.99% (p=0.000 n=10+10) Exp/P5/S2/D-2-10 272 ± 0% 234 ± 0% -13.97% (p=0.000 n=10+10) Ln/P10/S-100/D2-10 575 ± 0% 495 ± 0% -13.91% (p=0.000 n=10+10) Ln/P10/S100/D2-10 597 ± 0% 514 ± 0% -13.90% (p=0.000 n=10+10) Ln/P10/S100/D10-10 504 ± 0% 434 ± 0% -13.89% (p=0.000 n=10+10) Exp/P5/S2/D2-10 281 ± 0% 242 ± 0% -13.88% (p=0.000 n=10+10) Ln/P10/S-100/D10-10 549 ± 0% 473 ± 0% -13.84% (p=0.000 n=10+10) Ln/P2/S-100/D2-10 336 ± 0% 290 ± 0% -13.69% (p=0.000 n=10+10) Ln/P2/S100/D2-10 376 ± 0% 325 ± 0% -13.56% (p=0.000 n=10+10) GDA/ln-10 269k ± 0% 239k ± 0% -10.88% (p=0.000 n=10+10) GDA/log10-10 326k ± 0% 292k ± 0% -10.44% (p=0.000 n=10+10) GDA/power-10 699k ± 0% 627k ± 0% -10.30% (p=0.000 n=10+10) GDA/add-10 13.2k ± 0% 11.9k ± 0% -9.68% (p=0.000 n=10+10) Exp/P100/S-4/D10-10 701 ± 0% 638 ± 0% -8.99% (p=0.000 n=10+10) Exp/P100/S-4/D-10-10 715 ± 0% 651 ± 0% -8.95% (p=0.000 n=10+10) Exp/P100/S-4/D2-10 686 ± 0% 626 ± 0% -8.75% (p=0.000 n=10+10) Exp/P100/S-4/D-2-10 723 ± 0% 661 ± 0% -8.58% (p=0.000 n=10+10) Exp/P100/S-1/D10-10 1.36k ± 0% 1.26k ± 0% -7.99% (p=0.000 n=10+10) Exp/P100/S-1/D2-10 1.35k ± 0% 1.24k ± 0% -7.72% (p=0.000 n=9+10) Exp/P100/S-1/D-10-10 1.40k ± 0% 1.29k ± 0% -7.70% (p=0.000 n=10+10) Exp/P100/S-1/D-2-10 1.43k ± 0% 1.32k ± 0% -7.48% (p=0.000 n=9+10) Exp/P100/S2/D-2-10 2.03k ± 0% 1.89k ± 0% -6.82% (p=0.000 n=9+10) Exp/P100/S2/D10-10 2.28k ± 0% 2.13k ± 0% -6.81% (p=0.000 n=10+9) Exp/P100/S2/D-10-10 2.19k ± 0% 2.05k ± 0% -6.66% (p=0.001 n=8+9) Exp/P100/S2/D2-10 2.21k ± 0% 2.06k ± 0% -6.61% (p=0.000 n=9+10) Ln/P100/S-10/D2-10 6.00k ± 0% 5.65k ± 0% -5.87% (p=0.000 n=10+9) Ln/P100/S100/D2-10 6.25k ± 0% 5.89k ± 0% -5.80% (p=0.000 n=10+9) Ln/P100/S-2/D2-10 5.79k ± 0% 5.46k ± 0% -5.78% (p=0.000 n=10+10) Ln/P100/S100/D100-10 6.04k ± 0% 5.69k ± 0% -5.76% (p=0.000 n=10+10) Ln/P100/S2/D10-10 6.21k ± 0% 5.86k ± 0% -5.76% (p=0.000 n=10+10) Ln/P100/S2/D100-10 6.09k ± 0% 5.74k ± 0% -5.73% (p=0.000 n=10+8) Exp/P100/S-4/D100-10 855 ± 0% 806 ± 0% -5.73% (p=0.000 n=10+10) Ln/P100/S-10/D10-10 5.62k ± 0% 5.30k ± 0% -5.72% (p=0.000 n=10+8) Ln/P100/S-100/D2-10 6.34k ± 0% 5.98k ± 0% -5.68% (p=0.000 n=10+10) Ln/P100/S10/D100-10 5.73k ± 0% 5.41k ± 0% -5.67% (p=0.000 n=10+10) Ln/P100/S100/D10-10 6.08k ± 0% 5.74k ± 0% -5.66% (p=0.000 n=9+8) Ln/P100/S10/D2-10 6.13k ± 0% 5.78k ± 0% -5.66% (p=0.000 n=8+10) Ln/P100/S-10/D100-10 5.69k ± 0% 5.37k ± 0% -5.64% (p=0.000 n=10+10) Ln/P100/S10/D10-10 5.74k ± 0% 5.41k ± 0% -5.64% (p=0.000 n=10+10) Ln/P100/S-100/D10-10 5.55k ± 0% 5.23k ± 0% -5.63% (p=0.000 n=10+7) Ln/P100/S2/D2-10 5.86k ± 0% 5.53k ± 0% -5.62% (p=0.000 n=10+10) Exp/P100/S-4/D-100-10 893 ± 0% 843 ± 0% -5.60% (p=0.000 n=10+10) Ln/P100/S-100/D100-10 5.73k ± 0% 5.41k ± 0% -5.54% (p=0.000 n=10+10) Exp/P100/S-1/D100-10 1.68k ± 0% 1.58k ± 0% -5.52% (p=0.000 n=8+10) Ln/P100/S-2/D10-10 5.40k ± 0% 5.10k ± 0% -5.49% (p=0.000 n=10+10) Ln/P100/S-2/D100-10 5.77k ± 0% 5.45k ± 0% -5.49% (p=0.000 n=10+10) Exp/P100/S-1/D-100-10 1.68k ± 0% 1.59k ± 0% -5.41% (p=0.000 n=10+10) Exp/P100/S2/D100-10 2.61k ± 0% 2.48k ± 0% -5.03% (p=0.000 n=8+10) Exp/P100/S2/D-100-10 2.47k ± 0% 2.35k ± 0% -4.90% (p=0.000 n=10+10) GDA/multiply-10 931 ± 0% 889 ± 0% -4.51% (p=0.000 n=10+10) GDA/exp-10 911k ± 0% 895k ± 0% -1.75% (p=0.000 n=10+10) GDA/base-10 2.11k ± 0% 2.11k ± 0% ~ (all equal) GDA/comparetotal-10 254 ± 0% 254 ± 0% ~ (all equal) ```
Contributor
Author
|
I think we can simplify this even further by creating a wrapper around const inlineWords = 1 // or maybe 2
type BigInt struct {
inner big.Int
inline [inlineWords]big.Word
}
func (b *BigInt) lazyInit() {
if b.inner.Bits() == nil {
b.inline = [inlineWords]big.Word{}
b.inner.SetBits(b.inline[:0])
}
}
func (b *BigInt) Add(x, y *BigInt) *BigInt {
b.lazyInit()
b.inner.Add(&x.inner, &y.inner)
return b
}
...
type Decimal struct {
Form Form
Negative bool
Exponent int32
Coeff BigInt
}We could then re-implement the This |
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces cockroachdb/cockroach#74369.
This commit introduces a performance optimization that embeds small coefficient values directly in their
Decimalstruct, instead of storing these values in a separate heap allocation.Each
Decimalmaintains (throughbig.Int) an internal reference to a variable-length coefficient, which is represented by a[]big.Word. ThecoeffInlinefield and thelazyInitmethod combine to allow us to inline this coefficient within theDecimalstruct when its value is sufficiently small. InlazyInit, we point theCoefffield's backing slice at thecoeffInlinearray.big.Intwill avoid re-allocating this array until it is provided with a value that exceeds the initial capacity. We set this initial capacity to accommodate any coefficient that would fit in a 64-bit integer (i.e. values up to 2^64).This is an alternative to an optimization that many other arbitrary precision decimal libraries have where small coefficient values are stored as numeric fields in their data type's struct. Only when this coefficient value gets large enough do these libraries fall back to a variable-length coefficient with internal indirection. We can see the optimization in practice in the
ericlagergren/decimallibrary, where each struct contains acompact uint64and anunscaled big.Int. Prior concern from the authors ofcockroachdb/apdregarding this form of compact representation optimization was that it traded performance for complexity. The optimization fractured control flow, leaking out across the library and leading to more complex, error-prone code.The approach taken in this commit does not have the same issue. All arithmetic on the decimal's coefficient is still deferred to
bit.Int. In fact, the entire optimization is best-effort, and bugs that lead to missed calls tolazyInitare merely missed opportunities to avoid a heap allocation, and nothing more serious. The internal reference within Decimal is vulnerable to memory aliasing bugs when aDecimalis copied by value (and not through theDecimal.Setmethod), but this is not a new concern.Impact on benchmarks:
cc. @mjibson