Skip to content

tx indexer: use different field separator for keys#5865

Merged
cmwaters merged 46 commits intomasterfrom
callum/tx-indexer-key
Jan 12, 2021
Merged

tx indexer: use different field separator for keys#5865
cmwaters merged 46 commits intomasterfrom
callum/tx-indexer-key

Conversation

@cmwaters
Copy link
Contributor

@cmwaters cmwaters commented Jan 5, 2021

Description

Implemented the orderedcode library for consistency and lexicographic ordering of keys.
This also tackles the below issue of having a better escape mechanism using 0x00ff instead of /

Closes: #2907

Base automatically changed from callum/key-encoding to master January 5, 2021 15:53
@cmwaters cmwaters marked this pull request as ready for review January 6, 2021 14:37
@codecov
Copy link

codecov bot commented Jan 6, 2021

Codecov Report

Merging #5865 (9fba99d) into master (03a6fb2) will decrease coverage by 0.20%.
The diff coverage is 70.76%.

@@            Coverage Diff             @@
##           master    #5865      +/-   ##
==========================================
- Coverage   60.08%   59.88%   -0.21%     
==========================================
  Files         269      269              
  Lines       24499    24527      +28     
==========================================
- Hits        14721    14687      -34     
- Misses       8193     8237      +44     
- Partials     1585     1603      +18     
Impacted Files Coverage Δ
state/txindex/kv/kv.go 61.03% <70.76%> (+1.03%) ⬆️
crypto/sr25519/pubkey.go 43.47% <0.00%> (-8.70%) ⬇️
privval/signer_listener_endpoint.go 80.00% <0.00%> (-7.06%) ⬇️
privval/socket_listeners.go 78.72% <0.00%> (-4.26%) ⬇️
privval/secret_connection.go 72.68% <0.00%> (-3.61%) ⬇️
mempool/reactor.go 80.83% <0.00%> (-3.34%) ⬇️
blockchain/v2/reactor.go 32.73% <0.00%> (-2.88%) ⬇️
consensus/reactor.go 74.33% <0.00%> (-2.27%) ⬇️
p2p/transport_mconn.go 42.01% <0.00%> (-1.96%) ⬇️
statesync/snapshots.go 91.59% <0.00%> (-1.69%) ⬇️
... and 12 more

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has some issues that I think follows from using a suboptimal key structure. If I'm understanding this correctly, there's nothing separating event keys from height keys, such that they're mixed together and we have to do weird things like skip erroring entries while we're iterating over them if they don't have the expected format.

I think it might make more sense to have unique prefixes for the two different key types such that we'll never encounter a different key type while iterating over one type. Then we can simply error on any unexpected key formats. Or is there an actual reason for mixing them together, like we want to query multiple key types during a single iterator pass?

@cmwaters
Copy link
Contributor Author

cmwaters commented Jan 7, 2021

The height key is, in a way, a type of event key with a defined composite prefix of "tx.height" and a value of it's height (hence why we have height repeated twice - once as a string and another as an int64). All other event keys are chosen by the application and could have for example a composite prefix of "account.name" and a value of "Callum". I was thinking that because tendermint already reserves the composite prefix "tx.height" that we want to check that the application doesn't also use it as an event.

Other than that, the only place where keys could get shuffled together is with the hash key which has no prefix to it. For example if the hash of a tx just so happened to start with "tx.height" it would get inserted somewhere amongst the tx.height section. I was thinking we could add a "tx.hash" prefix to the hash key and reserve this composite prefix as well.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jan 7, 2021

The height key is, in a way, a type of event key with a defined composite prefix of "tx.height" and a value of it's height (hence why we have height repeated twice - once as a string and another as an int64).

Got it. Any reason for handling it as if it was some other key then? Can't we just have one key type?

Also, storing it as a string prevents us from doing range scans on it. Not sure if that's relevant, just saying (e.g. if someone queries tx.height < 10 we'll have to scan all heights rather than just the ones below 10).

the only place where keys could get shuffled together is with the hash key which has no prefix to it. For example if the hash of a tx just so happened to start with "tx.height" it would get inserted somewhere amongst the tx.height section. I was thinking we could add a "tx.hash" prefix to the hash key and reserve this composite prefix as well.

That might make sense. In that case, shouldn't this be reserved, specified, and handled in the same place as e.g. tx.height is? Which I'm presuming is outside of the kv package?

@cmwaters
Copy link
Contributor Author

cmwaters commented Jan 7, 2021

Can't we just have one key type?

I've sort of already made it one key. There is a function called key() that makes keys of the following format string/string/int64/uint32. Both the keyForEvent and keyForHeight are utility wrapper functions that generate keys of this format.

I guess the problem with combining height with the event key and forcing it to have the same format is, as you say, at the loss of more performant range operations. Note that this is just for queries where tx.height is the only condition. For example say the query was account.name = callum AND tx.height > 10 then we would still perform a range query with the prefix as "account.name" the value as "callum" and then the height which comes next in the key (at least that's what I think we would do).

That might make sense. In that case, shouldn't this be reserved, specified, and handled in the same place as e.g. tx.height is? Which I'm presuming is outside of the kv package?

Yup turns out we already have tx.hash prefix but it isn't reserved (neither is tx.height for that matter), i.e. I could create an event called tx.height set a different value and the tx would also be saved there. Also the tx.hash key isn't actually used in the kvstore only as part of querying. I will change this.

I guess in general there is a lot of thought that can be spent on this and probably lots of fully featured libraries that do a good job of parsing queries and implementing secondary indexes. It's quite interesting to me but I think I should save properly delving into it for a later time so for now I would ideally like to focus on just tweaking a small part of the keys

@erikgrinaker
Copy link
Contributor

I've sort of already made it one key. There is a function called key() that makes keys of the following format string/string/int64/uint32. Both the keyForEvent and keyForHeight are utility wrapper functions that generate keys of this format.

Right, ok -- in that case we shouldn't really need all of the checks that skip erroring/invalid keys, since they should all be in a correct format. Makes sense to unify the tx.hash stuff then I think.

I guess in general there is a lot of thought that can be spent on this and probably lots of fully featured libraries that do a good job of parsing queries and implementing secondary indexes.

Yeah. Honestly, I don't really see any point in implementing this ourselves -- I'd just use SQLite and call it a day.

Copy link
Contributor

@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@cmwaters cmwaters merged commit 5b698ed into master Jan 12, 2021
@cmwaters cmwaters deleted the callum/tx-indexer-key branch January 12, 2021 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change txindex internal field separator to NUL

3 participants