rpc: fetch multiple headers in getblockheader() #25261

natanleung · 2022-06-01T14:02:56Z

Motivation: getblockheader() has the count parameter to fetch more than one header in a single GET via REST.

This patch extends this functionality to bitcoin-cli and properly formats output arrays based on
the verbose parameter. This is a follow-up to an up-for-grabs issue #23330.

Implementation:
I took into account the review comments and previous implementation. The CLI method is used as follows:

bitcoin-cli getblockheader <block_hash> <verbose=true> <count=0>

count defaults to 0.

~~The previous implementation defaulted to 0, but in the REST syntax, this is actually an invalid count value. I believe that it makes sense to return a single block header by default.~~

If count is excluded or count = 0, the preexisting functionality is used to return the single block header of block in either hex or JSON format based on verbose.

No regression impact.

If 0 < count <= 2000, an array of block headers starting with block is returned in either hex or JSON format based on verbose.

The previous implementation traversed the ActiveChain upwards (towards the genesis block), but again, this differs from the REST functionality that traverses the ActiveChain downwards (away from the genesis block), which I replicated instead. If count surpasses the number of blocks from the desired to the active tip block, then the final block header in the array is simply that of the newest block on the ActiveChain.

If count < 0 or count > 2000, return an INVALID_PARAMETER error.
If count is not an integer, report a generic JSON parsing error.

Changes:

The argument has been added to rpc/client.cpp.
The getblockheader() help message has been updated accordingly.
The getblockheader() functionality has been expanded, as described above.

src/rpc/blockchain.cpp

maflcko · 2022-06-01T15:57:36Z

Please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

JeremyRubin · 2022-06-01T18:02:50Z

I think it makes sense to do count = 0 as a single entity, and >= 1 as an array, or otherwise make an additional as_array bool param to control the output type.

Otherwise the "type instability" is difficult to deal with for downstream consumers of the API. E.g., consider the code for https://docs.rs/bitcoincore-rpc/latest/src/bitcoincore_rpc/client.rs.html#337-341 .

natanleung · 2022-06-01T18:54:32Z

I think it makes sense to do count = 0 as a single entity, and >= 1 as an array, or otherwise make an additional as_array bool param to control the output type.

Otherwise the "type instability" is difficult to deal with for downstream consumers of the API. E.g., consider the code for https://docs.rs/bitcoincore-rpc/latest/src/bitcoincore_rpc/client.rs.html#337-341 .

Thanks for the reply. Note that there is no regression impact because all preexisting functionality is intact. Ultimately, it is an optional parameter which can be ignored.

To me, count = 0 is a bit counterintuitive. And as mentioned above, the REST call also interprets count = 0 as invalid, assuming we are aiming to emulate an improved version of the REST functionality in RPC.

From the old PR, it would appear that we are tending away from the unformatted (concatenated) output in new implementation. If this is indeed the case, then such an additional parameter does not make seem useful.

@MarcoFalke Do you have any input on this? And are any further updates to the PR needed at this point? All CI checks have passed. Thanks.

maflcko · 2022-06-01T18:57:20Z

What about:

null -> "flat" reply
int -> array reply

natanleung · 2022-06-01T19:39:02Z

What about:

null -> "flat" reply

int -> array reply

@JeremyRubin's suggestion has been implemented. I think I misunderstood the difference between count = 0 and count = 1 in terms of the "type instability". It makes sense to me now.

As you've denoted, this is now resolved such that:
count = 0 -> "flat" reply
count > 0 -> array reply

Let me know if any other changes are still needed. Thanks.

JeremyRubin · 2022-06-01T20:34:20Z

Yeah that makes sense. Null = flat works too. Idea isn't to concatenate anything, but rather to provide support for legacy behavior and something that matches the REST API. Since 0 is invalid in rest, all valid queries in JSON will match all valid rest queries, and legacy count=0 queries will return flat.

If we were to do a breaking API change, i'd say always return array :)

src/rpc/blockchain.cpp

DrahtBot · 2022-06-21T01:35:52Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	luke-jr, stickies-v
Concept ACK	LarryRuane

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

achow101 · 2022-10-12T19:26:20Z

cc @jonatack @luke-jr @LarryRuane

LarryRuane · 2022-11-14T13:23:07Z

We will be reviewing this PR this Wednesday https://bitcoincore.reviews/25261
@natanleung please join us if you can! (But certainly no pressure or obligation!)

LarryRuane

Concept and approach ACK, my suggestions are mostly formatting, except the comment about locking needs careful consideration.

This doc helps with formatting. But we don't accept everything it suggests, especially in RPC code where we use some non-standard formatting conventions.

This PR could also benefit from some tests. (But I understand the current lack of tests may be intentional because you want to get at least concept or approach ACKs before going to the trouble of writing tests.)

src/rpc/blockchain.cpp

LarryRuane · 2022-11-16T14:27:57Z

src/rpc/blockchain.cpp

+        else
+            headers.push_back(blockheaderToJSON(tip, pblockindex));
+
+        pblockindex = chainman.ActiveChain().Next(pblockindex);


I'm pretty sure calling Next() requires the thread to hold cs_main. But I don't think you want to grab and release the lock on each iteration of this loop; it's probably better to take the lock close to where you currently do, and hold it throughout this loop.

But it's not ideal to hold cs_main during the conversions to JSON or (for non-verbose) encodings to hex. So you may want to consider creating a vector of CBlockindex pointers, have each iteration of this loop push_back() to this vector, drop cs_main, then run a separate loop to format the results. (This is okay because once you have a CBlockIndex pointer, both the pointer and the object it points to remain valid without having to hold cs_main.)

It appears to be what caused CI to fail with -Wthread-safety-analysis enabled.

I have implemented @LarryRuane's suggestion to store the CBlockindex pointers and then iterate through them. However, the CI failure is still occurring. From what I recall, this wasn't happening before. Any insight into why this is now the case?

Have you looked at the CI logs? You can view them by clicking "details" next to the CI failure message here on GitHub. The error message is quite clear in this case:

rpc/blockchain.cpp:591:32: error: calling function 'ActiveChain' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis] pblockindex = chainman.ActiveChain().Next(pblockindex);

You can't call ActiveChain() without holding the cs_main mutex, as is done e.g. a few blocks of code above this line.

LarryRuane · 2022-11-16T16:40:23Z

@natanleung please remove the second commit, "Merge branch 'bitcoin:master' into rpc_getblockheader_count"

PRs should not include such merge commits. Actually, it looks like this is a rebase, not a merge commit (the merge commit is empty). But you should not rebase onto latest master unless there is a conflict, see rebasing changes. @DrahtBot will usually let you know (although there can be hidden conflicts that DrahtBot doesn't detect).

willcl-ark · 2022-11-16T16:40:33Z

You don't want to merge master into this commit with a merge commit. Rather you want to rebase your commit on top of master. You could do that with something like (double check this first!):

# reset to your first commit
git reset --hard 053ccf0468e477283e80f78cc095ffb83bff9b95
# rebase onto master if you have remote configured as "upstream"
git fetch --all && git rebase upstream/master

edit: @LarryRuane beat me to it :)

stickies-v

~~Concept ACK~~

Agreed with @LarryRuane that functional tests need updating to cover this.

Edit: leaning towards Concept NACK now

src/rpc/blockchain.cpp

stickies-v · 2022-11-16T16:59:54Z

src/rpc/blockchain.cpp

I would store request.params[2].isNull() into a bool return_array and avoid relying on a magic count==0 value. That makes the code more readable imo.

This change has been implemented as follows:

bool returnArray = !request.params[2].isNull();

The negation operator ! is used because if count != null, then an array is returned.

src/rpc/blockchain.cpp

luke-jr · 2022-11-16T18:35:32Z

Not sure why I didn't earlier, but...

Concept NACK. Why shouldn't this be accomplished through JSON-RPC batching?

stickies-v · 2022-11-17T14:53:51Z

Why shouldn't this be accomplished through JSON-RPC batching?

My initial intuitive reasoning was that this approach would be significantly faster than batching because of the overhead of each request as well as requiring the blockhash for each request. After doing some performance testing, that doesn't seem to be true. In which case, I think I'm leaning towards Concept NACK too.

Best case scenario (on my setup-up): to fetch n blockheaders getblockheader with count parameter was ~2x as fast as fetching it through a batch request. Even though that is a significant speed-up, I don't think it's significant enough to warrant more code complexity, even if it is quite modest as in this PR. Even though there are some UX benefits on cli to not having to query the blockhashes separately etc, I think this should be solved by tooling and not within Core to keep complexity minimal.

On signet, I've timed how long it takes to fetch n blockheaders using 3 different approaches:
loop: (not really discussed here, just included FYI/illustrative purposes) iteratively send getblockheader requests for single blockhash until n headers are fetched, using the blockhash from the result to query the next one. Only for n<=500 because it started failing for more requests due to timeouts etc.
batch: first send a single batch getblockhash request to query all blockhashes we need, then send a single batch getblockheader request for all the hashes
count: iteratively send getblockheader requests with count=2000 (or less), using the blockhash from the last header in the result to query the next array until n headers are fetched

performance benchmarking results

Time required to fetch n blockheaders:
-----
For 50 blockheaders:
loop: 0.0480s
batch: 0.0066s (and 0.0090s to fetch blockhashes)
count: 0.0038s
-----
For 500 blockheaders:
loop: 0.4594s
batch: 0.0170s (and 0.0063s to fetch blockhashes)
count: 0.0102s
-----
For 2000 blockheaders:
batch: 0.0476s (and 0.0105s to fetch blockhashes)
count: 0.0264s
-----
For 20000 blockheaders:
batch: 0.4847s (and 0.0865s to fetch blockhashes)
count: 0.2556s
-----
For 100000 blockheaders:
batch: 2.5618s (and 0.4156s to fetch blockhashes)
count: 1.2770s

performance testing code (`-signet -rpcuser user -rpcpassword pass`)

import json
import time

import requests


def get_n_blockhashes(n: int, start_height: int = 0):
    data = [{
        "method": "getblockhash",
        "params": [height]
    } for height in range(start_height, start_height + n)]
    hashes = [item["result"] for item in make_request(data)]
    return hashes


def get_n_headers_batch(hashes):
    """Call getblockheader one time with an array of data items, one for each blockhash"""
    data = []
    for blockhash in hashes:
        data.append({"method": "getblockheader", "params": [blockhash]})
    headers = [item["result"] for item in make_request(data)]
    return headers


def get_n_headers_loop(n: int):
    """Call getblockheader n times to get one blockheader each time"""
    headers = []
    blockhash = get_n_blockhashes(1)[0]
    while len(headers) < n:
        data = {"method": "getblockheader", "params": [blockhash]}
        header = make_request(data)["result"]
        blockhash = header["nextblockhash"]
        headers.append(header)

    return headers


def get_n_headers_count(n: int):
    """Call getblockheader with the `count` parameter until n headers are fetched"""
    max_per_request = 2000
    remaining = n
    headers = []
    blockhash = get_n_blockhashes(1)[0]
    while remaining > 0:
        count = min(remaining, max_per_request)
        data = {
            "method": "getblockheader",
            "params": [blockhash, True, count]
        }
        headers += make_request(data)["result"]
        remaining -= count
        blockhash = headers[-1]["nextblockhash"]
    return headers


def make_request(data):
    url = "http://user:pass@127.0.0.1:38332/"
    r = requests.post(url, data=json.dumps(data))
    assert r.status_code == 200
    return r.json()


def time_function(fn, *args, last_hash_check: str, **kwargs) -> float:
    """Return average fn time execution and check that the last obtained blockheader hash matches last_hash_check """
    iters = 10
    start = time.perf_counter()
    for i in range(iters):
        result = fn(*args, **kwargs)
        assert result[-1]["hash"] == last_hash_check
    avg_duration = (time.perf_counter() - start) / iters
    return avg_duration


if __name__ == '__main__':
    for n in [50, 500, 2000, 20000, 100000]:
        print(f"For {n} blockheaders:")

        hash_start = time.perf_counter()
        hashes = get_n_blockhashes(n)
        hash_time = time.perf_counter() - hash_start

        if n <= 500:
            print(f"loop: {time_function(get_n_headers_loop, n, last_hash_check=hashes[-1]):.4f}s")
        print(f"batch: {time_function(get_n_headers_batch, hashes, last_hash_check=hashes[-1]):.4f}s (and {hash_time:.4f}s to fetch blockhashes)")
        print(f"count: {time_function(get_n_headers_count, n, last_hash_check=hashes[-1]):.4f}s")
        print("-----")

willcl-ark · 2022-11-17T16:29:07Z

I tested your code @stickies-v and, despite my timings coming out significantly slower than yours (curious about that...), found the same trends, so would tend to agree that this can be done using batching:

❯ ./benchmark_headers.py
For 50 blockheaders:
loop: 0.1021s
batch: 0.0109s (and 0.0049s to fetch blockhashes)
count: 0.0060s
-----
For 500 blockheaders:
loop: 1.0044s
batch: 0.0782s (and 0.0215s to fetch blockhashes)
count: 0.0254s
-----
For 2000 blockheaders:
batch: 0.3283s (and 0.0768s to fetch blockhashes)
count: 0.1245s
-----
For 20000 blockheaders:
batch: 3.0241s (and 0.7651s to fetch blockhashes)
count: 1.1144s
-----
For 100000 blockheaders:
batch: 15.4626s (and 3.8391s to fetch blockhashes)
count: 5.6350s
-----

will@ubuntu in bitcoin on  rpc_getblockheader_count [?] via 🐍 v3.6.15 took 4m34s

edit: debug=1 in config was slowing it down quite a bit

getblockheader() has the count parameter to fetch more than one header in a single GET via REST. This patch extends this functionality to the CLI and properly formats output arrays based on the verbose parameter.

natanleung · 2022-11-20T23:09:37Z

According to #23330 (comment) from the previous PR, the original author considered RPC batching.

I took on this PR as an up-for-grabs ticket so I thought this was desired functionality. If the changes look good, can I fix the CI failure, write tests, and close? Thanks.

stickies-v · 2022-11-21T14:32:10Z

I took on this PR as an up-for-grabs ticket so I thought this was desired functionality.

First of all, really appreciate you working on this and contributing - and I hope you'll continue to regardless of the outcome of this PR! Unfortunately, in the decentralized development model of Core, we don't really have a well-defined concept of "desired functionality" - we don't have anyone deciding on features/roadmaps/... It all comes down to emergent consensus among developers. I think in this case, the concern is that even though the PR does seem like an improvement in some areas (e.g. performance, UX) it is also a deterioration in other areas (maintainability, unnecessarily complex method parameters). Some reviewers including myself seem to think that this feature is not desired as we think the cons outweigh the pros. If you look at the original issue #23330, you'll see that there was no clear consensus yet on whether or not batching was a feasible alternative, which could be a hint that desirability is unclear.

The comments in #23330 mention lack of atomicity (i.e. reorgs could happen during the batch request), but I'm not sure that's an important concern. It's quite trivial for the client to inspect that the headers are indeed part of the same chain (each header links to the previous header), and since reorgs rarely happen to more than a few blocks this would never be any kind of performance concern. Moreover, the approach in this PR also doesn't have that kind of atomicity (although it could be introduced with more aggressive locking of cs_main.)

DrahtBot · 2023-01-26T11:34:14Z

🐙 This pull request conflicts with the target branch and needs rebase.

stickies-v · 2023-01-26T11:56:59Z

With seeming lack of agreement around the concept, and no further follow-up from the author, I think this can be closed?

fanquake · 2023-01-26T12:25:57Z

Going to close for now. Looks like the author has also disappeared from GitHub.

natanleung · 2023-01-26T15:26:46Z

Hey, thanks for keeping up with this PR. I am definitely still interested in working on this, but due to the lack of clarity, I don't know how to proceed. The last few posts suggest that this isn't really wanted, so what would be the best way to modify this into something that is?

If there is something that would be delivered out of this, then I would be more than happy to work on implementing that. Let me know what you all think. Thanks!

natanleung mentioned this pull request Jun 1, 2022

WIP: [RPC] allow fetching headers by pages #23330

Closed

maflcko reviewed Jun 1, 2022

View reviewed changes

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved

DrahtBot added the RPC/REST/ZMQ label Jun 1, 2022

natanleung requested a review from maflcko June 1, 2022 15:52

luke-jr suggested changes Jun 18, 2022

View reviewed changes

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved

natanleung requested a review from luke-jr June 19, 2022 06:52

LarryRuane reviewed Nov 16, 2022

View reviewed changes

stickies-v reviewed Nov 16, 2022

View reviewed changes

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved

rpc: fetch multiple headers in getblockheader()

73f5135

getblockheader() has the count parameter to fetch more than one header in a single GET via REST. This patch extends this functionality to the CLI and properly formats output arrays based on the verbose parameter.

natanleung requested review from luke-jr and stickies-v and removed request for luke-jr, maflcko and stickies-v November 21, 2022 01:38

natanleung requested review from LarryRuane, luke-jr, maflcko, stickies-v and willcl-ark and removed request for LarryRuane, luke-jr, maflcko, stickies-v and willcl-ark November 21, 2022 01:38

DrahtBot added the Needs rebase label Jan 26, 2023

fanquake closed this Jan 26, 2023

bitcoin locked and limited conversation to collaborators Jan 26, 2024

rpc: fetch multiple headers in getblockheader() #25261

rpc: fetch multiple headers in getblockheader() #25261

Uh oh!

Conversation

natanleung commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maflcko commented Jun 1, 2022

Uh oh!

JeremyRubin commented Jun 1, 2022

Uh oh!

natanleung commented Jun 1, 2022

Uh oh!

maflcko commented Jun 1, 2022

Uh oh!

natanleung commented Jun 1, 2022

Uh oh!

JeremyRubin commented Jun 1, 2022

Uh oh!

Uh oh!

Uh oh!

DrahtBot commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

achow101 commented Oct 12, 2022

Uh oh!

LarryRuane commented Nov 14, 2022

Uh oh!

LarryRuane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LarryRuane Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

willcl-ark Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

natanleung Nov 20, 2022

Choose a reason for hiding this comment

Uh oh!

stickies-v Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

LarryRuane commented Nov 16, 2022

Uh oh!

willcl-ark commented Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stickies-v left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stickies-v Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

natanleung Nov 20, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luke-jr commented Nov 16, 2022

Uh oh!

stickies-v commented Nov 17, 2022

Uh oh!

willcl-ark commented Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natanleung commented Nov 20, 2022

Uh oh!

stickies-v commented Nov 21, 2022

natanleung commented Jun 1, 2022 •

edited

Loading

DrahtBot commented Jun 21, 2022 •

edited

Loading

LarryRuane left a comment •

edited

Loading

willcl-ark commented Nov 16, 2022 •

edited

Loading

stickies-v left a comment •

edited

Loading

willcl-ark commented Nov 17, 2022 •

edited

Loading