-
Notifications
You must be signed in to change notification settings - Fork 38.7k
index: store per-block transaction locations for efficient lookups #32541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
|
Concept ACK Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block. |
Thanks!
Sure - updated in #32541 (comment). |
|
Fixed a few issues (following SonarQube run). |
|
How does this compare to |
|
I have used fetching using the new indexfetching using
|
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
|
Rebased to fix #32541 (comment). |
As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match, which I am guessing is what the last benchmark is showing. romanz, would this new endpoint be used while creating the entire index initially, or to serve certain requests? It is not quite clear to me when an indexing client wouldn't want to read through the entire block and instead only get its transactions selectively. |
Correct - the proposed index improves the performance of fetching a single transaction (similar to
I would like the new index to be used to serve history-related queries. You are right that during the history indexing process, the client doesn't need the proposed index, since it needs to read both the entire block (and undo) data in order to map from BTW, I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :) |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I wonder if there should be a fallback?
|
Rebased over |
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). No logging takes place in case of an invalid offset/size (to avoid spamming the log), by using a new `ReadRawError::BadPartRange` error variant. Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). No logging takes place in case of an invalid offset/size (to avoid spamming the log), by using a new `ReadRawError::BadPartRange` error variant. Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
0713529 rest: allow reading partial block data from storage (Roman Zeyde) 4e2af1c blockstorage: allow reading partial block data from storage (Roman Zeyde) f2fd1aa blockstorage: return an error code from `ReadRawBlock()` (Roman Zeyde) Pull request description: It allows fetching specific transactions using an external index, following #32541 (comment). Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions. However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction[^1]. Other indexers use a `txindex` to fetch a transaction using its txid [^2][^3][^4]. The above approach has significant storage and CPU overhead, since the `txid` is a pseudo-random 32-byte value. Also, mainnet `txindex` takes ~60GB today. This PR is adding support for using the transaction's position within its block to be able to fetch it directly using [REST API](https://github.com/bitcoin/bitcoin/blob/master/doc/REST-interface.md), using the following HTTP request: ``` GET /rest/blockpart/BLOCKHASH.bin?offset=OFFSET&size=SIZE ``` - The offsets' index can be encoded much more efficiently ([~1.3GB today](romanz/bindex-rs#66 (comment))). - Address history query performance can be tested on mainnet using [1BitcoinEaterAddressDontSendf59kuE](https://mempool.space/address/1BitcoinEaterAddressDontSendf59kuE) - assuming warm OS block cache, [it takes <1s to fetch 5200 txs, i.e. <0.2ms per tx](romanz/bindex-rs#66 (comment)) with [bindex](https://github.com/romanz/bindex-rs). - Only binary and hex response formats are supported. [^1]: https://github.com/romanz/electrs/blob/master/doc/schema.md [^2]: https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore [^3]: https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites [^4]: https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements ACKs for top commit: maflcko: review ACK 0713529 🏪 l0rinc: ACK 0713529 hodlinator: re-ACK 0713529 Tree-SHA512: bcce7bf4b9a3e5e920ab5a83e656f50d5d7840cdde6b7147d329cf578f8a2db555fc1aa5334e8ee64d5630d25839ece77a2cf421c6c3ac1fa379bb453163bd4f
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). No logging takes place in case of an invalid offset/size (to avoid spamming the log), by using a new `ReadRawError::BadPartRange` error variant. Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). No logging takes place in case of an invalid offset/size (to avoid spamming the log), by using a new `ReadRawError::BadPartRange` error variant. Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
It will allow fetching specific transactions using an external index, following bitcoin#32541 (comment). Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com> Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.
However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction1. Other indexers use a
txindexto fetch a transaction using its txid 234.The above approach has significant storage and CPU overhead, since the
txidis a pseudo-random 32-byte value.This PR is adding support for using the transaction's position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the
N-th transaction fromBLOCKHASH):If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.
The resulting index is much smaller (allowing it to be cached in RAM):
The new index is using the following DB schema:
For example, when fetching the 5000th transaction of block #90005 using
ab -k -c 1 -n 100000, enabledlocationsindeximproves the performance ~19x (2.574ms → 0.136ms).I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)
Footnotes
https://github.com/romanz/electrs/blob/master/doc/schema.md ↩
https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore ↩
https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites ↩
https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements ↩