-
Notifications
You must be signed in to change notification settings - Fork 38.7k
util: Pass size to ParseHex to assist preallocation #18061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
strlenstrlen twice
|
ut: i think the compiler would optimize strlen being called twice into once |
|
@sanjaykdragon good question. I'll check godbolt later. But even if you're right it's A. compiler dependent. B. still one time more than this. EDIT: even with simplification, and force inlining everything it still calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about the check done in ValidAsCString?
I think you need to show improvements otherwise it doesn't pay off. FWIW having ValidAsCString avoids .reserve().
src/util/strencodings.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3fad849f2f9d4c250e44fb7c329e1f93c14e48c7
NACK this change. Instead you can drop ParseHex(const char*) and move the code here with the proper changes - I've done it and make check ran fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we have places where it's called with const char*. probably by dropping that it just calls the std::string constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion: can't we remove the std::vector<unsigned char> ParseHex(const char* psz); variant completely?
Does this cause any actual issues if we did?
I'd like to move away from C-strings where possible and it seems more of a simplification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "problem" is a degregation in speed for places where we call this with C strings which I think is mostly(if not only?) tests(their raw strings usage)
It's probably worth the simplicity though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copying a string actually relevant performance wise here? Even if it was, parsing hex should only happen on user or rpc inputs, so I don't think we need to over-optimize this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoFalke in my benchmarks if you have char* and you convert it into std::string it spends almost the same time as the code before this PR (X2 slower). I think it's not about copying, it's about heap allocation.
But if you prefer I'll be fine with removing the raw char* support because we probably only ever benefit from it in tests.
I agree that there's no need to over-optimize this but this is basically free optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echoing the request of simply dropping ParseHex(const char* psz).
Generally I think we should try move away from C-strings where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elichai Is there a single call site that needs the raw string interface? If not, and you have performance concerns, you may explicitly delete the interface, and thus force all callers to explicitly call ParseHex(std::string{psz}) and be aware of the performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked now, it seems like ParseHex(const char* psz) is only used in tests except for 2 places:
https://github.com/bitcoin/bitcoin/blob/master/src/chainparams.cpp#L55
https://github.com/bitcoin/bitcoin/blob/master/src/net_processing.cpp#L2101
Both should almost never really run, so I'll just drop it
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
3fad849 to
349816a
Compare
|
So added some benchmarks, and it seems to be within the margin of noise except for After: |
349816a to
42af035
Compare
strlen twice
utACK: if these numbers are real and true, this is a nice improvement |
42af035 to
40aa036
Compare
|
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
1 similar comment
There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
|
Added "up for grabs", see also https://github.com/bitcoin/bitcoin/pull/23595/commits |
|
I've split up the common changes (base refactoring work) into #24751 |
|
Closing this in favour of #24764 for now. |
|
I don't think this was fixed by #24764 |
I closed it because the PR has been inactive for a long time, the up for grabs label had already been applied, and it was just generating noise by conflicting with on-going changes. If someone wants to continue these changes they could rebase, and open a new PR, with the changes that are still relevant. |
Hi,
This is kinda chasing Concept ACK if it's worth the review effort.* (noticed it while reviewing a PR)Right now both DecodeBase32 and DecodeBase64 are only ever being called withstd::stringbut still process viachar*and so they runstrlen(O(n)) twice, once to validate it doesn't contain random zeros(ValidAsCString) another to get the length.Instead we can iterate it as a std::string and then if we see0in the middle of the string we can fail the process.* if it is, there's also DecodeBase58 .And as a by-product saw 2 other places where we can pre-allocate memory.
EDIT: So the by-product turned out to be the only interesting part of this PR :) (see #18061 (comment))
According to the benchmarks this can cut the time it takes to parse a hex by half.