Add address-based index (attempt 4?)#14053
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
2157893 to
196ff3d
Compare
54a8e72 to
4865275
Compare
ryanofsky
left a comment
There was a problem hiding this comment.
Reviewed most of the code, but just skimmed tests. It looks to me like this PR could be merged basically in its current form, so I'm curious if you're intending to make the improvements cited in the PR description above here or in a separate PRs.
I left a few minor suggestions about the code, which you should feel free to ignore. The only changes I would definitely like to see here are:
- adding some python code in
test/functional/to call the new rpc method - adding a blurb in
doc/release notes.mdto describe the feature and maybe mention some use-cases
| std::unique_ptr<AddrIndex> g_addrindex; | ||
|
|
||
| /** | ||
| * Access to the addrindex database (indexes/addrindex/) |
There was a problem hiding this comment.
In commit "Introduce address index" (3c7cc3c)
Note: new index/addrindex.cpp, index/addrindex.h, and test/addrindex_tests.cpp files in this commit mirror existing index/txindex.cpp and index/txindex.h, test/txindex_tests.cpp files and have some code and comments in common. It can help to diff the addr files against the tx files when reviewing this PR.
| "\nArguments:\n" | ||
| "1. \"address\" (string, required) The address to search for\n" | ||
| "2. \"verbose\" (bool, optional, default = false) If set to false, only returns data for hex-encoded `txid`s. \n" | ||
| "3. \"skip\" (numeric, optional, default = 0) If set, the result skips this number of initial values. \n" |
There was a problem hiding this comment.
skip and count probably make sense on an options object instead.
There was a problem hiding this comment.
Nice work! I'm glad someone's working on this. Concept ACK.
- The AddrIndex should return information about the outpoints and differentiate between outputs and spends, not just return the raw transactions. In fact, the AddrIndex could just return outpoints, then the client code could use the TxIndex could to fetch the tx bodies. It'd involve a separate lookup though.
- I don't think it's necessary to delete keys from the database when a block is disconnected. There's no harm in leaving it. The higher level methods to search the index can then filter for results that are on the main chain if that's what the client wants. It'd have to do this anyway to avoid races with reorgs and such.
- I'm worried about collisions on address IDs because they are only 64 bits. I can think of three options, 1) use a 32 byte cryptographic hash, 2) use a 20 byte cryptographic hash of the script plus some randomly generated, database-global salt, 3) use a 32- or 64-bit non-cryptographic hash (might as well use Murmur3 or SipHash, not truncated SHA256), then store the full script as the database value to double check against. Option 3 feels best to me.
- What's the purpose of having the first byte of the block hash as the value? It doesn't seem robust nor particularly useful.
|
Thanks for all the reviews. To answer some of @jimpo's questions: I think that returning just the outpoint is a better idea than the current choice so I'll switch to that and try to incorporate all the other feedback here. |
Adds index that relates scriptPubKeys to location of transactions in the filesystem, along with the hash of the block they are found in, and the outpoint information of the txout with the related script.
Setup address index in initialization process. Add initialization warning and wallet feature request warning as suggested by ryanofsky.
Adds searchrawtransactions RPC that uses the address index to lookup transactions and outpoints by script and address. Adds basic functional tests for searchrawtransactions.
|
Can someone plz approve this? |
|
Concept ACK |
|
@marcinja will this allow other wallets like Electrum to utilize this feature (RPC credentials provided, of course) and avoid having to run ElecturmX (https://github.com/spesmilo/electrumx), EPS (https://github.com/chris-belcher/electrum-personal-server) or BWT (https://github.com/shesek/bwt) intermediary software? @ecdsa @SomberNight @chris-belcher @shesek could you provide your input on how this feature might be useful (or not)? |
|
The main issue AFAIU is that Electrum is using |
|
@Talkless note that while some Electrum users run their own bitcoind, many do not. Electrum wants to support both use cases, and in fact the suspicion is that most users just use a public server. When using a public server, the client cannot use bitcoind RPC, hence in that case I don't see how the middleware (e.g. ElectrumX) could be avoided. For the own bitcoind use case, maybe the client could have another optional mode of operation where it uses bitcoind RPC directly, which is I guess what you are asking about. For that, an address-index in bitcoind is the main thing missing indeed, however not the only one. For one, the electrum protocol (the client<->"middleware server" connection) has address subscriptions - the client gets a notification when a history of one its addresses changes. We are also planning on soon adding another method into the protocol that allows IMHO there are multiple upsides for having this middleware setup for Electrum:
Nevertheless if someone steps up and contributes patches, this kind of thing could be added.
That sounds much larger than expected.
Instead of having both Also, I expect most users of this index would also want txindex enabled. You might want to consider making address index dependent on txindex. Have you investigated how much space that would save? There would be no need to store Another trick that ElectrumX uses is that only the best chain is indexed. We have a |
|
Concept ACK (might have already done that) |
|
I'm concept -0 on this. My primary objection is that I think it's a bad idea for any infrastructure to be built all that relies on having fully-indexed blockchain data available (this also applies to txindex, but we can't just remove support...). However, it seems many people want something like this, and are going to use it anyway. The question is then whether it belongs in the bitcoin-core codebase. Alternative, and more performant presumably, like electrs exist already too, so it isn't exactly impossible to do this elsewhere. Still, given that we now have the indexes infrastructure, it means that things like this are easy to add in a fairly modular way without invading consensus code. So if people really want this, fine. Overall approach comment: I don't think MurmurHash should be used for anything new; there are strictly better hash functions available. I'd suggest SipHash if that's fast enough. |
|
I'm also a little more negative on having this in Core than I previously was. After working in a few industrial contexts on wallet stuff, it's clear to me that an address index is really only required if you want to implement a block explorer or do chain analysis. For both of these applications, using something like electrs seems sufficient. For personal wallet management, a full address index is not required. I think the origin of some confusion is that things like the Electrum Personal Server have become synonymous with this kind of usage, but in reality a full index is overkill when descriptor-specific rescans can be done once for a historical backfill and then per-block scanning can be done from there on out. I want to point out that this is a nice implementation and good work by @marcinja, but I'm leaning slightly against the inclusion of such an index in Core at this point. |
I agree on this. IMO the only use cases to ever use a full address index are:
1 (instant backup recovery) could be solved with either 2 (a backend for thousands of wallets): out of scope for this project. 3 (explore purposes): I think this is a valid use case. Though adding this PR to Bitcoin Core will lead to many many projects using it in production increasing the traffic in this project and eventually steal time from existing contributors (rebase, maintenance, drag-along) My main fear is that people are going to use this index (a full address index) to use it as an electrum(ish) backend for a handful of wallets. With multiwallet, watch-only-wallets, PSBT, we have all tools to server multiple wallets in a scalable way for external applications. I also think merging this as it is, would be in contradiction to the process- and repository-separation effort. Therefore I'm ~0 (slightly towards NACK) to add this. |
|
Hi all, thanks for the feedback and review. This was an enjoyable PR to work on and I learned a lot from all your comments. I'm closing this PR because its size probably requires stronger support from contributors to get in. It also seems more clear now that all of the practical use-cases are covered by existing features and some lightweight alternatives (#20664) . I also agree that it would be bad to incentivize using an address index to support an external electrum wallet, when it's not the intended use-case and would cause unnecessary burden on contributors and maintainers in this project, e.g. from users of those wallets wanting new features or updates. |
|
@marcinja thank you and I hope to see more contributions of this quality from you. |
Adds index to transactions by scriptPubKey. Based off of #2802. Stores hashes of scripts (hashed using Murmurhash3, with a hash seed that is stored in the index database), along with the
COutPointfor the output which was spent/created, and theCDiskTxPosfor the transaction in which this happened.