Skip to content

Conversation

@romanz
Copy link
Contributor

@romanz romanz commented May 17, 2025

Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction1. Other indexers use a txindex to fetch a transaction using its txid 234.

The above approach has significant storage and CPU overhead, since the txid is a pseudo-random 32-byte value.

This PR is adding support for using the transaction's position within its block to be able to fetch it directly using REST API, using the following HTTP request (to fetch the N-th transaction from BLOCKHASH):

GET /rest/txfromblock/BLOCKHASH-N.bin

If binary response format is used, the transaction data will be read directly from the storage and sent back to the client, without any deserialization overhead.

The resulting index is much smaller (allowing it to be cached in RAM):

$ du -sh indexes/locations/ indexes/txindex/
2.5G	indexes/locations/
57G	indexes/txindex/

The new index is using the following DB schema:

struct DBKey {
    uint256 hash;   // blockhash
    uint32_t row;   // allow splitting one block's transactions into multiple DB rows
};

struct DBValue {
    FlatFilePos block_pos;          // file id + offset of the block
    std::vector<uint32_t> offsets;  // a list of transaction offsets within the block
};

For example, when fetching the 5000th transaction of block #90005 using ab -k -c 1 -n 100000, enabled locationsindex improves the performance ~19x (2.574ms → 0.136ms).

I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

Footnotes

  1. https://github.com/romanz/electrs/blob/master/doc/schema.md

  2. https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore

  3. https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites

  4. https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements

@DrahtBot
Copy link
Contributor

DrahtBot commented May 17, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32541.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK TheCharlatan, hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30469 (index: Fix coinstats overflow by fjahr)
  • #26966 (index: initial sync speedup, parallelize process by furszy)
  • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
  • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
  • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
  • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task lint: https://github.com/bitcoin/bitcoin/runs/42402043332
LLM reason (✨ experimental): The CI failure is due to missing include guards in src/index/locationsindex.h.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@romanz romanz force-pushed the locations-index branch from 40ee9a9 to 1f62974 Compare May 17, 2025 07:45
@sedited
Copy link
Contributor

sedited commented May 17, 2025

Concept ACK

Can you add the schema of the index and the expected arguments for the REST API to the pull request description? I was a bit confused at first if this now exposes the file position, but if I read it correctly now, this just allows querying a transaction by its index in the block.

@romanz
Copy link
Contributor Author

romanz commented May 17, 2025

Concept ACK

Thanks!

Can you add the schema of the index and the expected arguments for the REST API to the pull request description?

Sure - updated in #32541 (comment).

@romanz romanz force-pushed the locations-index branch from 1f62974 to c074ad2 Compare May 17, 2025 11:55
@romanz
Copy link
Contributor Author

romanz commented May 17, 2025

Fixed a few issues (following SonarQube run).

@luke-jr
Copy link
Member

luke-jr commented May 20, 2025

How does this compare to getrawtransaction <txid> 0 <blockhash> without a txindex?

@romanz
Copy link
Contributor Author

romanz commented May 21, 2025

I have used ApacheBench 2.3 for benchmarking REST API, and the following Rust client for getrawtransaction RPC:

fetching using the new index

$ ab -k -c 1 -n 100000 http://localhost:8332/rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin

Document Path:          /rest/txfromblock/0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe-42.bin
Document Length:        301 bytes

Concurrency Level:      1
Time taken for tests:   13.760 seconds
Complete requests:      100000
Failed requests:        0
Keep-Alive requests:    100000
Total transferred:      40500000 bytes
HTML transferred:       30100000 bytes
Requests per second:    7267.65 [#/sec] (mean)
Time per request:       0.138 [ms] (mean)
Time per request:       0.138 [ms] (mean, across all concurrent requests)
Transfer rate:          2874.41 [Kbytes/sec] received

fetching using txindex

$ ab -k -c 1 -n 100000 http://localhost:8332/rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin

Document Path:          /rest/tx/4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a.bin
Document Length:        301 bytes

Concurrency Level:      1
Time taken for tests:   14.075 seconds
Complete requests:      100000
Failed requests:        0
Keep-Alive requests:    100000
Total transferred:      40500000 bytes
HTML transferred:       30100000 bytes
Requests per second:    7104.78 [#/sec] (mean)
Time per request:       0.141 [ms] (mean)
Time per request:       0.141 [ms] (mean, across all concurrent requests)
Transfer rate:          2810.00 [Kbytes/sec] received

fetching without txindex

time cargo run --release -- 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe
    Finished `release` profile [optimized] target(s) in 0.02s
     Running `target/release/bench-getrawtx 4137d0dbad434d68a4f52b7bebcba91ddac3f7f5c92b84130432bd6b5e2ea57a 0000000000000000000083a0cff38278aae196d6d923a7e8ee7e5a0e371226fe`
iterations = 1000
average RPC duration = 8.563491ms

real	0m8.628s
user	0m0.070s
sys	0m0.052s

Conclusions

  • The new LocationsIndex is only a few percent faster than the old TxIndex, but the on-disk footprint is ~22x smaller.

  • getrawtransaction which is used in the last benchmark has an average RPC duration of ~8.6ms vs ~0.14ms for the ones above.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/42406243587
LLM reason (✨ experimental): The CI failure is caused by a missing header file test/util/index.h during compilation.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@romanz romanz force-pushed the locations-index branch from c074ad2 to d962c9a Compare June 13, 2025 18:35
@romanz
Copy link
Contributor Author

romanz commented Jun 13, 2025

Rebased to fix #32541 (comment).

@romanz romanz marked this pull request as draft June 14, 2025 08:58
@romanz romanz marked this pull request as ready for review June 14, 2025 12:29
@sedited
Copy link
Contributor

sedited commented Jun 15, 2025

How does this compare to getrawtransaction 0 without a txindex?

As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match, which I am guessing is what the last benchmark is showing. romanz, would this new endpoint be used while creating the entire index initially, or to serve certain requests? It is not quite clear to me when an indexing client wouldn't want to read through the entire block and instead only get its transactions selectively.

@romanz
Copy link
Contributor Author

romanz commented Jun 15, 2025

As far as I understand the index makes this operation faster by not requiring to read the entire block and then iterating through the transactions to find the match

Correct - the proposed index improves the performance of fetching a single transaction (similar to txindex), requiring significantly less storage.

would this new endpoint be used while creating the entire index initially, or to serve certain requests?

I would like the new index to be used to serve history-related queries.
For example, https://electrum-protocol.readthedocs.io/en/latest/protocol-methods.html#blockchain-scripthash-get-history.

You are right that during the history indexing process, the client doesn't need the proposed index, since it needs to read both the entire block (and undo) data in order to map from ScriptPubKey to list of { block hash + transaction index within the block }.

BTW, I am working on a proof-of-concept indexer (https://github.com/romanz/bindex-rs) which is using #32540 & #32541 - please let me know if there are any questions/comments/concerns :)

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I wonder if there should be a fallback?

@romanz romanz force-pushed the locations-index branch from d962c9a to 8d446fc Compare June 21, 2025 09:57
@romanz
Copy link
Contributor Author

romanz commented Jun 21, 2025

Rebased over master to use std::vector<std::byte> (following #32743).

@romanz romanz marked this pull request as draft June 21, 2025 14:29
romanz added a commit to romanz/bitcoin that referenced this pull request Dec 11, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
romanz added a commit to romanz/bitcoin that referenced this pull request Dec 11, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
romanz added a commit to romanz/bitcoin that referenced this pull request Dec 11, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
romanz added a commit to romanz/bitcoin that referenced this pull request Dec 11, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
fanquake added a commit that referenced this pull request Dec 12, 2025
0713529 rest: allow reading partial block data from storage (Roman Zeyde)
4e2af1c blockstorage: allow reading partial block data from storage (Roman Zeyde)
f2fd1aa blockstorage: return an error code from `ReadRawBlock()` (Roman Zeyde)

Pull request description:

  It allows fetching specific transactions using an external index, following #32541 (comment).

  Currently, electrs and other indexers map between an address/scripthash to the list of the relevant transactions.

  However, in order to fetch those transactions from bitcoind, electrs relies on reading the whole block and post-filtering for a specific transaction[^1]. Other indexers use a `txindex` to fetch a transaction using its txid [^2][^3][^4].

  The above approach has significant storage and CPU overhead, since the `txid` is a pseudo-random 32-byte value. Also, mainnet `txindex` takes ~60GB today.

  This PR is adding support for using the transaction's position within its block to be able to fetch it directly using [REST API](https://github.com/bitcoin/bitcoin/blob/master/doc/REST-interface.md), using the following HTTP request:

  ```
  GET /rest/blockpart/BLOCKHASH.bin?offset=OFFSET&size=SIZE
  ```

  - The offsets' index can be encoded much more efficiently ([~1.3GB today](romanz/bindex-rs#66 (comment))).

  - Address history query performance can be tested on mainnet using [1BitcoinEaterAddressDontSendf59kuE](https://mempool.space/address/1BitcoinEaterAddressDontSendf59kuE) - assuming warm OS block cache, [it takes <1s to fetch 5200 txs, i.e. <0.2ms per tx](romanz/bindex-rs#66 (comment)) with [bindex](https://github.com/romanz/bindex-rs).

  - Only binary and hex response formats are supported.

  [^1]: https://github.com/romanz/electrs/blob/master/doc/schema.md
  [^2]: https://github.com/Blockstream/electrs/blob/new-index/doc/schema.md#txstore
  [^3]: https://github.com/spesmilo/electrumx/blob/master/docs/HOWTO.rst#prerequisites
  [^4]: https://github.com/cculianu/Fulcrum/blob/master/README.md#requirements

ACKs for top commit:
  maflcko:
    review ACK 0713529 🏪
  l0rinc:
    ACK 0713529
  hodlinator:
    re-ACK 0713529

Tree-SHA512: bcce7bf4b9a3e5e920ab5a83e656f50d5d7840cdde6b7147d329cf578f8a2db555fc1aa5334e8ee64d5630d25839ece77a2cf421c6c3ac1fa379bb453163bd4f
davidgumberg pushed a commit to davidgumberg/bitcoin that referenced this pull request Dec 18, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
davidgumberg pushed a commit to davidgumberg/bitcoin that referenced this pull request Dec 18, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
0xB10C pushed a commit to 0xB10C/bitcoin that referenced this pull request Dec 28, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

No logging takes place in case of an invalid offset/size (to avoid spamming the log),
by using a new `ReadRawError::BadPartRange` error variant.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
0xB10C pushed a commit to 0xB10C/bitcoin that referenced this pull request Dec 28, 2025
It will allow fetching specific transactions using an external index,
following bitcoin#32541 (comment).

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants