-
Notifications
You must be signed in to change notification settings - Fork 38.7k
rpc: fetch multiple headers in getblockheader() #25261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits |
|
I think it makes sense to do count = 0 as a single entity, and >= 1 as an array, or otherwise make an additional as_array bool param to control the output type. Otherwise the "type instability" is difficult to deal with for downstream consumers of the API. E.g., consider the code for https://docs.rs/bitcoincore-rpc/latest/src/bitcoincore_rpc/client.rs.html#337-341 . |
Thanks for the reply. Note that there is no regression impact because all preexisting functionality is intact. Ultimately, it is an optional parameter which can be ignored. To me, From the old PR, it would appear that we are tending away from the unformatted (concatenated) output in new implementation. If this is indeed the case, then such an additional parameter does not make seem useful. @MarcoFalke Do you have any input on this? And are any further updates to the PR needed at this point? All CI checks have passed. Thanks. |
|
What about:
|
@JeremyRubin's suggestion has been implemented. I think I misunderstood the difference between As you've denoted, this is now resolved such that: Let me know if any other changes are still needed. Thanks. |
|
Yeah that makes sense. Null = flat works too. Idea isn't to concatenate anything, but rather to provide support for legacy behavior and something that matches the REST API. Since 0 is invalid in rest, all valid queries in JSON will match all valid rest queries, and legacy count=0 queries will return flat. If we were to do a breaking API change, i'd say always return array :) |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
|
We will be reviewing this PR this Wednesday https://bitcoincore.reviews/25261 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept and approach ACK, my suggestions are mostly formatting, except the comment about locking needs careful consideration.
This doc helps with formatting. But we don't accept everything it suggests, especially in RPC code where we use some non-standard formatting conventions.
This PR could also benefit from some tests. (But I understand the current lack of tests may be intentional because you want to get at least concept or approach ACKs before going to the trouble of writing tests.)
src/rpc/blockchain.cpp
Outdated
| else | ||
| headers.push_back(blockheaderToJSON(tip, pblockindex)); | ||
|
|
||
| pblockindex = chainman.ActiveChain().Next(pblockindex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure calling Next() requires the thread to hold cs_main. But I don't think you want to grab and release the lock on each iteration of this loop; it's probably better to take the lock close to where you currently do, and hold it throughout this loop.
But it's not ideal to hold cs_main during the conversions to JSON or (for non-verbose) encodings to hex. So you may want to consider creating a vector of CBlockindex pointers, have each iteration of this loop push_back() to this vector, drop cs_main, then run a separate loop to format the results. (This is okay because once you have a CBlockIndex pointer, both the pointer and the object it points to remain valid without having to hold cs_main.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to be what caused CI to fail with -Wthread-safety-analysis enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented @LarryRuane's suggestion to store the CBlockindex pointers and then iterate through them. However, the CI failure is still occurring. From what I recall, this wasn't happening before. Any insight into why this is now the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at the CI logs? You can view them by clicking "details" next to the CI failure message here on GitHub. The error message is quite clear in this case:
rpc/blockchain.cpp:591:32: error: calling function 'ActiveChain' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
pblockindex = chainman.ActiveChain().Next(pblockindex);
You can't call ActiveChain() without holding the cs_main mutex, as is done e.g. a few blocks of code above this line.
|
@natanleung please remove the second commit, "Merge branch 'bitcoin:master' into rpc_getblockheader_count" PRs should not include such merge commits. Actually, it looks like this is a rebase, not a merge commit (the merge commit is empty). But you should not rebase onto latest master unless there is a conflict, see rebasing changes. @DrahtBot will usually let you know (although there can be hidden conflicts that DrahtBot doesn't detect). |
|
You don't want to merge master into this commit with a merge commit. Rather you want to rebase your commit on top of master. You could do that with something like (double check this first!): # reset to your first commit
git reset --hard 053ccf0468e477283e80f78cc095ffb83bff9b95
# rebase onto master if you have remote configured as "upstream"
git fetch --all && git rebase upstream/masteredit: @LarryRuane beat me to it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Agreed with @LarryRuane that functional tests need updating to cover this.
src/rpc/blockchain.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would store request.params[2].isNull() into a bool return_array and avoid relying on a magic count==0 value. That makes the code more readable imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has been implemented as follows:
bool returnArray = !request.params[2].isNull();
The negation operator ! is used because if count != null, then an array is returned.
|
Not sure why I didn't earlier, but... Concept NACK. Why shouldn't this be accomplished through JSON-RPC batching? |
My initial intuitive reasoning was that this approach would be significantly faster than batching because of the overhead of each request as well as requiring the blockhash for each request. After doing some performance testing, that doesn't seem to be true. In which case, I think I'm leaning towards Concept NACK too. Best case scenario (on my setup-up): to fetch On signet, I've timed how long it takes to fetch performance benchmarking resultsperformance testing code (`-signet -rpcuser user -rpcpassword pass`)import json
import time
import requests
def get_n_blockhashes(n: int, start_height: int = 0):
data = [{
"method": "getblockhash",
"params": [height]
} for height in range(start_height, start_height + n)]
hashes = [item["result"] for item in make_request(data)]
return hashes
def get_n_headers_batch(hashes):
"""Call getblockheader one time with an array of data items, one for each blockhash"""
data = []
for blockhash in hashes:
data.append({"method": "getblockheader", "params": [blockhash]})
headers = [item["result"] for item in make_request(data)]
return headers
def get_n_headers_loop(n: int):
"""Call getblockheader n times to get one blockheader each time"""
headers = []
blockhash = get_n_blockhashes(1)[0]
while len(headers) < n:
data = {"method": "getblockheader", "params": [blockhash]}
header = make_request(data)["result"]
blockhash = header["nextblockhash"]
headers.append(header)
return headers
def get_n_headers_count(n: int):
"""Call getblockheader with the `count` parameter until n headers are fetched"""
max_per_request = 2000
remaining = n
headers = []
blockhash = get_n_blockhashes(1)[0]
while remaining > 0:
count = min(remaining, max_per_request)
data = {
"method": "getblockheader",
"params": [blockhash, True, count]
}
headers += make_request(data)["result"]
remaining -= count
blockhash = headers[-1]["nextblockhash"]
return headers
def make_request(data):
url = "http://user:pass@127.0.0.1:38332/"
r = requests.post(url, data=json.dumps(data))
assert r.status_code == 200
return r.json()
def time_function(fn, *args, last_hash_check: str, **kwargs) -> float:
"""Return average fn time execution and check that the last obtained blockheader hash matches last_hash_check """
iters = 10
start = time.perf_counter()
for i in range(iters):
result = fn(*args, **kwargs)
assert result[-1]["hash"] == last_hash_check
avg_duration = (time.perf_counter() - start) / iters
return avg_duration
if __name__ == '__main__':
for n in [50, 500, 2000, 20000, 100000]:
print(f"For {n} blockheaders:")
hash_start = time.perf_counter()
hashes = get_n_blockhashes(n)
hash_time = time.perf_counter() - hash_start
if n <= 500:
print(f"loop: {time_function(get_n_headers_loop, n, last_hash_check=hashes[-1]):.4f}s")
print(f"batch: {time_function(get_n_headers_batch, hashes, last_hash_check=hashes[-1]):.4f}s (and {hash_time:.4f}s to fetch blockhashes)")
print(f"count: {time_function(get_n_headers_count, n, last_hash_check=hashes[-1]):.4f}s")
print("-----") |
|
I tested your code @stickies-v and, despite my timings coming out significantly slower than yours (curious about that...), found the same trends, so would tend to agree that this can be done using batching: edit: |
getblockheader() has the count parameter to fetch more than one header in a single GET via REST. This patch extends this functionality to the CLI and properly formats output arrays based on the verbose parameter.
|
According to #23330 (comment) from the previous PR, the original author considered RPC batching. I took on this PR as an up-for-grabs ticket so I thought this was desired functionality. If the changes look good, can I fix the CI failure, write tests, and close? Thanks. |
First of all, really appreciate you working on this and contributing - and I hope you'll continue to regardless of the outcome of this PR! Unfortunately, in the decentralized development model of Core, we don't really have a well-defined concept of "desired functionality" - we don't have anyone deciding on features/roadmaps/... It all comes down to emergent consensus among developers. I think in this case, the concern is that even though the PR does seem like an improvement in some areas (e.g. performance, UX) it is also a deterioration in other areas (maintainability, unnecessarily complex method parameters). Some reviewers including myself seem to think that this feature is not desired as we think the cons outweigh the pros. If you look at the original issue #23330, you'll see that there was no clear consensus yet on whether or not batching was a feasible alternative, which could be a hint that desirability is unclear. The comments in #23330 mention lack of atomicity (i.e. reorgs could happen during the batch request), but I'm not sure that's an important concern. It's quite trivial for the client to inspect that the headers are indeed part of the same chain (each header links to the previous header), and since reorgs rarely happen to more than a few blocks this would never be any kind of performance concern. Moreover, the approach in this PR also doesn't have that kind of atomicity (although it could be introduced with more aggressive locking of |
|
🐙 This pull request conflicts with the target branch and needs rebase. |
|
With seeming lack of agreement around the concept, and no further follow-up from the author, I think this can be closed? |
|
Going to close for now. Looks like the author has also disappeared from GitHub. |
|
Hey, thanks for keeping up with this PR. I am definitely still interested in working on this, but due to the lack of clarity, I don't know how to proceed. The last few posts suggest that this isn't really wanted, so what would be the best way to modify this into something that is? If there is something that would be delivered out of this, then I would be more than happy to work on implementing that. Let me know what you all think. Thanks! |
Motivation:
getblockheader()has thecountparameter to fetch more than one header in a single GET via REST.This patch extends this functionality to
bitcoin-cliand properly formats output arrays based onthe
verboseparameter. This is a follow-up to an up-for-grabs issue #23330.Implementation:
I took into account the review comments and previous implementation. The CLI method is used as follows:
countdefaults to 0.countis excluded orcount = 0, the preexisting functionality is used to return the single block header of block in either hex or JSON format based onverbose.0 < count <= 2000, an array of block headers starting with block is returned in either hex or JSON format based onverbose.count < 0orcount > 2000, return anINVALID_PARAMETERerror.countis not an integer, report a generic JSON parsing error.Changes:
rpc/client.cpp.getblockheader()help message has been updated accordingly.getblockheader()functionality has been expanded, as described above.