-
Notifications
You must be signed in to change notification settings - Fork 38.7k
optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization: Speed up Base58 encoding/decoding by 400%/200% via preliminary byte packing #29473
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29473. ReviewsSee the guideline for information on the review process. ConflictsNo conflicts as of last run. |
afbc980 to
321456b
Compare
|
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
397ae64 to
e168e62
Compare
cab8a6e to
b930cd5
Compare
ba23707 to
c9b351d
Compare
|
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
c9b351d to
83caaf7
Compare
83caaf7 to
73845ec
Compare
864e479 to
2dd3818
Compare
|
I presume, if you wanted to speed up |
|
I was following the benchmarks to decide what to work on, but since you've mentioned this is an important usecase, I measured it as well. |
|
I don't think I said that "this is an important usecase". I simply said that the reason for the improvement in #7656 (comment) was However, this is just background information and a guess. I don't know if it is true, how easy it is to implement, whether it is worth it to review, or worth it at all. |
|
@luke-jr, would this optimization be more welcome in https://github.com/bitcoin/libbase58 instead? |
2dd3818 to
75a86de
Compare
The algorithm used in libbase58 is different, so not sure this even applies. Kinda doubt it would be worth the time to port/review either, unless a library consumer cares about performance or the other criteria explained already here. If you're just looking for things to do, extending libbase58 to Bech32 might be worth doing. See bitcoin/libbase58#6 |
75a86de to
7feb85b
Compare
Note that the constants for size allocations have slightly changed after recalculating them
Instead of collecting the values to and empty b58, we're cloning the input without leading zeros and dividing it in-place, while doing the quadratic division from the most significant, non-zero digit, forward. make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 59.34 | 16,851,250.88 | 0.2% | 1.10 | `Base58Encode` | 59.20 | 16,892,479.61 | 0.2% | 1.10 | `Base58Encode` | 58.97 | 16,958,226.76 | 0.2% | 1.10 | `Base58Encode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 40.80 | 24,508,402.06 | 0.1% | 1.10 | `Base58Encode` | 40.83 | 24,489,762.49 | 0.2% | 1.10 | `Base58Encode` | 39.71 | 25,182,310.62 | 0.2% | 1.10 | `Base58Encode` ``` and make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 79.93 | 12,511,576.75 | 0.4% | 1.10 | `Base58CheckEncode` | 79.49 | 12,580,136.21 | 0.1% | 1.10 | `Base58CheckEncode` | 79.65 | 12,554,785.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 55.27 | 18,094,561.35 | 0.1% | 1.10 | `Base58CheckEncode` | 55.41 | 18,046,778.32 | 0.1% | 1.10 | `Base58CheckEncode` | 55.32 | 18,075,763.16 | 0.1% | 1.10 | `Base58CheckEncode` ``` Co-authored-by: optout21 <13562139+optout21@users.noreply.github.com>
To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7, fitting them into 64-bit long integers. This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead. The benchmarks indicate a 4x speedup for `Base58CheckEncode` and `Base58Encode`: make && ./src/bench/bench_bitcoin --filter=Base58Encode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 14.32 | 69,831,031.44 | 0.1% | 1.10 | `Base58Encode` | 14.32 | 69,811,995.18 | 0.1% | 1.10 | `Base58Encode` | 14.35 | 69,662,527.88 | 0.5% | 1.10 | `Base58Encode` ``` make && ./src/bench/bench_bitcoin --filter=Base58CheckEncode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 19.15 | 52,231,968.11 | 0.1% | 1.06 | `Base58CheckEncode` | 19.13 | 52,269,345.54 | 0.4% | 1.05 | `Base58CheckEncode` | 19.14 | 52,244,117.61 | 0.3% | 1.06 | `Base58CheckEncode` ```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 Before: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17.50 | 57,141,622.65 | 0.1% | 1.10 | `Base58Decode` | 17.42 | 57,392,223.96 | 0.0% | 1.10 | `Base58Decode` | 17.43 | 57,358,655.44 | 0.2% | 1.10 | `Base58Decode` ``` After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 16.37 | 61,093,842.90 | 0.2% | 1.10 | `Base58Decode` | 16.37 | 61,100,514.64 | 0.1% | 1.10 | `Base58Decode` | 16.38 | 61,046,275.93 | 0.1% | 1.10 | `Base58Decode` ```
> make && ./src/bench/bench_bitcoin --filter=Base58Decode --min-time=1000 After: ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 8.79 | 113,767,685.06 | 0.1% | 1.10 | `Base58Decode` | 8.78 | 113,831,528.33 | 0.0% | 1.10 | `Base58Decode` | 8.80 | 113,607,470.35 | 0.2% | 1.10 | `Base58Decode` ```
7feb85b to
c1291fd
Compare
|
As this is basically a complete rewrite of the base58 encoding and decoding code, I don't think the review effort required to review this is worth it relative to the unimportance of these functions. |
|
Thanks for checking it out @achow101. |
To mitigate the quadratic complexity inherent in the sequential byte division of the Base58 encoding process, this update aggregates characters into groups of 7 and 9, fitting them into 64-bit long integers.
This refinement utilizes the inherent efficiency of native division and modulo operations on 64-bit architectures, significantly optimizing computational overhead.
The benchmarks indicate a ~4.4x speedup for
Base58Encode:before:
Base58Encodeafter:
Base58EncodeThe same was applied to decoding, resulting in a ~2x speedup:
before:
Base58Decodeafter:
Base58DecodeThis also speed up listunspent by >50%.
See #29473 (comment) for additional benchmarks and #30035 for additional tests.
I've also tested with with the following temp test (comparing the exact previous impl with the new one for random inputs: https://gist.github.com/paplorinc/96d8e355f6fef10ac79f62d89a6d9f19#file-base58_tests-cpp-L211-L249)