Skip to content

Conversation

@natanleung
Copy link

@natanleung natanleung commented Jun 1, 2022

Motivation: getblockheader() has the count parameter to fetch more than one header in a single GET via REST.

This patch extends this functionality to bitcoin-cli and properly formats output arrays based on
the verbose parameter. This is a follow-up to an up-for-grabs issue #23330.

Implementation:
I took into account the review comments and previous implementation. The CLI method is used as follows:

bitcoin-cli getblockheader <block_hash> <verbose=true> <count=0>
  • count defaults to 0.

The previous implementation defaulted to 0, but in the REST syntax, this is actually an invalid count value. I believe that it makes sense to return a single block header by default.

  • If count is excluded or count = 0, the preexisting functionality is used to return the single block header of block in either hex or JSON format based on verbose.

No regression impact.

  • If 0 < count <= 2000, an array of block headers starting with block is returned in either hex or JSON format based on verbose.

The previous implementation traversed the ActiveChain upwards (towards the genesis block), but again, this differs from the REST functionality that traverses the ActiveChain downwards (away from the genesis block), which I replicated instead. If count surpasses the number of blocks from the desired to the active tip block, then the final block header in the array is simply that of the newest block on the ActiveChain.

  • If count < 0 or count > 2000, return an INVALID_PARAMETER error.
  • If count is not an integer, report a generic JSON parsing error.

Changes:

  • The argument has been added to rpc/client.cpp.
  • The getblockheader() help message has been updated accordingly.
  • The getblockheader() functionality has been expanded, as described above.

@maflcko
Copy link
Member

maflcko commented Jun 1, 2022

Please squash your commits according to https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#squashing-commits

@JeremyRubin
Copy link
Contributor

I think it makes sense to do count = 0 as a single entity, and >= 1 as an array, or otherwise make an additional as_array bool param to control the output type.

Otherwise the "type instability" is difficult to deal with for downstream consumers of the API. E.g., consider the code for https://docs.rs/bitcoincore-rpc/latest/src/bitcoincore_rpc/client.rs.html#337-341 .

@natanleung
Copy link
Author

I think it makes sense to do count = 0 as a single entity, and >= 1 as an array, or otherwise make an additional as_array bool param to control the output type.

Otherwise the "type instability" is difficult to deal with for downstream consumers of the API. E.g., consider the code for https://docs.rs/bitcoincore-rpc/latest/src/bitcoincore_rpc/client.rs.html#337-341 .

Thanks for the reply. Note that there is no regression impact because all preexisting functionality is intact. Ultimately, it is an optional parameter which can be ignored.

To me, count = 0 is a bit counterintuitive. And as mentioned above, the REST call also interprets count = 0 as invalid, assuming we are aiming to emulate an improved version of the REST functionality in RPC.

From the old PR, it would appear that we are tending away from the unformatted (concatenated) output in new implementation. If this is indeed the case, then such an additional parameter does not make seem useful.

@MarcoFalke Do you have any input on this? And are any further updates to the PR needed at this point? All CI checks have passed. Thanks.

@maflcko
Copy link
Member

maflcko commented Jun 1, 2022

What about:

  • null -> "flat" reply
  • int -> array reply

@natanleung
Copy link
Author

What about:

  • null -> "flat" reply
  • int -> array reply

@JeremyRubin's suggestion has been implemented. I think I misunderstood the difference between count = 0 and count = 1 in terms of the "type instability". It makes sense to me now.

As you've denoted, this is now resolved such that:
count = 0 -> "flat" reply
count > 0 -> array reply

Let me know if any other changes are still needed. Thanks.

@JeremyRubin
Copy link
Contributor

Yeah that makes sense. Null = flat works too. Idea isn't to concatenate anything, but rather to provide support for legacy behavior and something that matches the REST API. Since 0 is invalid in rest, all valid queries in JSON will match all valid rest queries, and legacy count=0 queries will return flat.

If we were to do a breaking API change, i'd say always return array :)

@natanleung natanleung requested a review from luke-jr June 19, 2022 06:52
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 21, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK luke-jr, stickies-v
Concept ACK LarryRuane

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@achow101
Copy link
Member

cc @jonatack @luke-jr @LarryRuane

@LarryRuane
Copy link
Contributor

We will be reviewing this PR this Wednesday https://bitcoincore.reviews/25261
@natanleung please join us if you can! (But certainly no pressure or obligation!)

Copy link
Contributor

@LarryRuane LarryRuane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept and approach ACK, my suggestions are mostly formatting, except the comment about locking needs careful consideration.

This doc helps with formatting. But we don't accept everything it suggests, especially in RPC code where we use some non-standard formatting conventions.

This PR could also benefit from some tests. (But I understand the current lack of tests may be intentional because you want to get at least concept or approach ACKs before going to the trouble of writing tests.)

else
headers.push_back(blockheaderToJSON(tip, pblockindex));

pblockindex = chainman.ActiveChain().Next(pblockindex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure calling Next() requires the thread to hold cs_main. But I don't think you want to grab and release the lock on each iteration of this loop; it's probably better to take the lock close to where you currently do, and hold it throughout this loop.

But it's not ideal to hold cs_main during the conversions to JSON or (for non-verbose) encodings to hex. So you may want to consider creating a vector of CBlockindex pointers, have each iteration of this loop push_back() to this vector, drop cs_main, then run a separate loop to format the results. (This is okay because once you have a CBlockIndex pointer, both the pointer and the object it points to remain valid without having to hold cs_main.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be what caused CI to fail with -Wthread-safety-analysis enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented @LarryRuane's suggestion to store the CBlockindex pointers and then iterate through them. However, the CI failure is still occurring. From what I recall, this wasn't happening before. Any insight into why this is now the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at the CI logs? You can view them by clicking "details" next to the CI failure message here on GitHub. The error message is quite clear in this case:

rpc/blockchain.cpp:591:32: error: calling function 'ActiveChain' requires holding mutex 'cs_main' exclusively [-Werror,-Wthread-safety-analysis]
        pblockindex = chainman.ActiveChain().Next(pblockindex);

You can't call ActiveChain() without holding the cs_main mutex, as is done e.g. a few blocks of code above this line.

@LarryRuane
Copy link
Contributor

@natanleung please remove the second commit, "Merge branch 'bitcoin:master' into rpc_getblockheader_count"

PRs should not include such merge commits. Actually, it looks like this is a rebase, not a merge commit (the merge commit is empty). But you should not rebase onto latest master unless there is a conflict, see rebasing changes. @DrahtBot will usually let you know (although there can be hidden conflicts that DrahtBot doesn't detect).

@willcl-ark
Copy link
Member

willcl-ark commented Nov 16, 2022

You don't want to merge master into this commit with a merge commit. Rather you want to rebase your commit on top of master. You could do that with something like (double check this first!):

# reset to your first commit
git reset --hard 053ccf0468e477283e80f78cc095ffb83bff9b95
# rebase onto master if you have remote configured as "upstream"
git fetch --all && git rebase upstream/master

edit: @LarryRuane beat me to it :)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Agreed with @LarryRuane that functional tests need updating to cover this.

Edit: leaning towards Concept NACK now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would store request.params[2].isNull() into a bool return_array and avoid relying on a magic count==0 value. That makes the code more readable imo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been implemented as follows:

bool returnArray = !request.params[2].isNull();

The negation operator ! is used because if count != null, then an array is returned.

@luke-jr
Copy link
Member

luke-jr commented Nov 16, 2022

Not sure why I didn't earlier, but...

Concept NACK. Why shouldn't this be accomplished through JSON-RPC batching?

@stickies-v
Copy link
Contributor

Why shouldn't this be accomplished through JSON-RPC batching?

My initial intuitive reasoning was that this approach would be significantly faster than batching because of the overhead of each request as well as requiring the blockhash for each request. After doing some performance testing, that doesn't seem to be true. In which case, I think I'm leaning towards Concept NACK too.

Best case scenario (on my setup-up): to fetch n blockheaders getblockheader with count parameter was ~2x as fast as fetching it through a batch request. Even though that is a significant speed-up, I don't think it's significant enough to warrant more code complexity, even if it is quite modest as in this PR. Even though there are some UX benefits on cli to not having to query the blockhashes separately etc, I think this should be solved by tooling and not within Core to keep complexity minimal.


On signet, I've timed how long it takes to fetch n blockheaders using 3 different approaches:
loop: (not really discussed here, just included FYI/illustrative purposes) iteratively send getblockheader requests for single blockhash until n headers are fetched, using the blockhash from the result to query the next one. Only for n<=500 because it started failing for more requests due to timeouts etc.
batch: first send a single batch getblockhash request to query all blockhashes we need, then send a single batch getblockheader request for all the hashes
count: iteratively send getblockheader requests with count=2000 (or less), using the blockhash from the last header in the result to query the next array until n headers are fetched

performance benchmarking results
Time required to fetch n blockheaders:
-----
For 50 blockheaders:
loop: 0.0480s
batch: 0.0066s (and 0.0090s to fetch blockhashes)
count: 0.0038s
-----
For 500 blockheaders:
loop: 0.4594s
batch: 0.0170s (and 0.0063s to fetch blockhashes)
count: 0.0102s
-----
For 2000 blockheaders:
batch: 0.0476s (and 0.0105s to fetch blockhashes)
count: 0.0264s
-----
For 20000 blockheaders:
batch: 0.4847s (and 0.0865s to fetch blockhashes)
count: 0.2556s
-----
For 100000 blockheaders:
batch: 2.5618s (and 0.4156s to fetch blockhashes)
count: 1.2770s
performance testing code (`-signet -rpcuser user -rpcpassword pass`)
import json
import time

import requests


def get_n_blockhashes(n: int, start_height: int = 0):
    data = [{
        "method": "getblockhash",
        "params": [height]
    } for height in range(start_height, start_height + n)]
    hashes = [item["result"] for item in make_request(data)]
    return hashes


def get_n_headers_batch(hashes):
    """Call getblockheader one time with an array of data items, one for each blockhash"""
    data = []
    for blockhash in hashes:
        data.append({"method": "getblockheader", "params": [blockhash]})
    headers = [item["result"] for item in make_request(data)]
    return headers


def get_n_headers_loop(n: int):
    """Call getblockheader n times to get one blockheader each time"""
    headers = []
    blockhash = get_n_blockhashes(1)[0]
    while len(headers) < n:
        data = {"method": "getblockheader", "params": [blockhash]}
        header = make_request(data)["result"]
        blockhash = header["nextblockhash"]
        headers.append(header)

    return headers


def get_n_headers_count(n: int):
    """Call getblockheader with the `count` parameter until n headers are fetched"""
    max_per_request = 2000
    remaining = n
    headers = []
    blockhash = get_n_blockhashes(1)[0]
    while remaining > 0:
        count = min(remaining, max_per_request)
        data = {
            "method": "getblockheader",
            "params": [blockhash, True, count]
        }
        headers += make_request(data)["result"]
        remaining -= count
        blockhash = headers[-1]["nextblockhash"]
    return headers


def make_request(data):
    url = "http://user:pass@127.0.0.1:38332/"
    r = requests.post(url, data=json.dumps(data))
    assert r.status_code == 200
    return r.json()


def time_function(fn, *args, last_hash_check: str, **kwargs) -> float:
    """Return average fn time execution and check that the last obtained blockheader hash matches last_hash_check """
    iters = 10
    start = time.perf_counter()
    for i in range(iters):
        result = fn(*args, **kwargs)
        assert result[-1]["hash"] == last_hash_check
    avg_duration = (time.perf_counter() - start) / iters
    return avg_duration


if __name__ == '__main__':
    for n in [50, 500, 2000, 20000, 100000]:
        print(f"For {n} blockheaders:")

        hash_start = time.perf_counter()
        hashes = get_n_blockhashes(n)
        hash_time = time.perf_counter() - hash_start

        if n <= 500:
            print(f"loop: {time_function(get_n_headers_loop, n, last_hash_check=hashes[-1]):.4f}s")
        print(f"batch: {time_function(get_n_headers_batch, hashes, last_hash_check=hashes[-1]):.4f}s (and {hash_time:.4f}s to fetch blockhashes)")
        print(f"count: {time_function(get_n_headers_count, n, last_hash_check=hashes[-1]):.4f}s")
        print("-----")

@willcl-ark
Copy link
Member

willcl-ark commented Nov 17, 2022

I tested your code @stickies-v and, despite my timings coming out significantly slower than yours (curious about that...), found the same trends, so would tend to agree that this can be done using batching:

❯ ./benchmark_headers.py
For 50 blockheaders:
loop: 0.1021s
batch: 0.0109s (and 0.0049s to fetch blockhashes)
count: 0.0060s
-----
For 500 blockheaders:
loop: 1.0044s
batch: 0.0782s (and 0.0215s to fetch blockhashes)
count: 0.0254s
-----
For 2000 blockheaders:
batch: 0.3283s (and 0.0768s to fetch blockhashes)
count: 0.1245s
-----
For 20000 blockheaders:
batch: 3.0241s (and 0.7651s to fetch blockhashes)
count: 1.1144s
-----
For 100000 blockheaders:
batch: 15.4626s (and 3.8391s to fetch blockhashes)
count: 5.6350s
-----

will@ubuntu in bitcoin on  rpc_getblockheader_count [?] via 🐍 v3.6.15 took 4m34s

edit: debug=1 in config was slowing it down quite a bit

getblockheader() has the count parameter to fetch more than one header in a single GET via REST.

This patch extends this functionality to the CLI and properly formats output arrays based on the verbose parameter.
@natanleung
Copy link
Author

According to #23330 (comment) from the previous PR, the original author considered RPC batching.

I took on this PR as an up-for-grabs ticket so I thought this was desired functionality. If the changes look good, can I fix the CI failure, write tests, and close? Thanks.

@natanleung natanleung requested review from luke-jr and stickies-v and removed request for luke-jr, maflcko and stickies-v November 21, 2022 01:38
@stickies-v
Copy link
Contributor

I took on this PR as an up-for-grabs ticket so I thought this was desired functionality.

First of all, really appreciate you working on this and contributing - and I hope you'll continue to regardless of the outcome of this PR! Unfortunately, in the decentralized development model of Core, we don't really have a well-defined concept of "desired functionality" - we don't have anyone deciding on features/roadmaps/... It all comes down to emergent consensus among developers. I think in this case, the concern is that even though the PR does seem like an improvement in some areas (e.g. performance, UX) it is also a deterioration in other areas (maintainability, unnecessarily complex method parameters). Some reviewers including myself seem to think that this feature is not desired as we think the cons outweigh the pros. If you look at the original issue #23330, you'll see that there was no clear consensus yet on whether or not batching was a feasible alternative, which could be a hint that desirability is unclear.

The comments in #23330 mention lack of atomicity (i.e. reorgs could happen during the batch request), but I'm not sure that's an important concern. It's quite trivial for the client to inspect that the headers are indeed part of the same chain (each header links to the previous header), and since reorgs rarely happen to more than a few blocks this would never be any kind of performance concern. Moreover, the approach in this PR also doesn't have that kind of atomicity (although it could be introduced with more aggressive locking of cs_main.)

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

@stickies-v
Copy link
Contributor

With seeming lack of agreement around the concept, and no further follow-up from the author, I think this can be closed?

@fanquake
Copy link
Member

Going to close for now. Looks like the author has also disappeared from GitHub.

@fanquake fanquake closed this Jan 26, 2023
@natanleung
Copy link
Author

Hey, thanks for keeping up with this PR. I am definitely still interested in working on this, but due to the lack of clarity, I don't know how to proceed. The last few posts suggest that this isn't really wanted, so what would be the best way to modify this into something that is?

If there is something that would be delivered out of this, then I would be more than happy to work on implementing that. Let me know what you all think. Thanks!

@bitcoin bitcoin locked and limited conversation to collaborators Jan 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants