tx indexer: use different field separator for keys by cmwaters · Pull Request #5865 · tendermint/tendermint

cmwaters · 2021-01-05T15:50:40Z

Description

Implemented the orderedcode library for consistency and lexicographic ordering of keys.
This also tackles the below issue of having a better escape mechanism using 0x00ff instead of /

Closes: #2907

state/txindex/kv/kv.go

codecov · 2021-01-06T14:44:27Z

Codecov Report

Merging #5865 (9fba99d) into master (03a6fb2) will decrease coverage by 0.20%.
The diff coverage is 70.76%.

@@            Coverage Diff             @@
##           master    #5865      +/-   ##
==========================================
- Coverage   60.08%   59.88%   -0.21%     
==========================================
  Files         269      269              
  Lines       24499    24527      +28     
==========================================
- Hits        14721    14687      -34     
- Misses       8193     8237      +44     
- Partials     1585     1603      +18

Impacted Files	Coverage Δ
state/txindex/kv/kv.go	`61.03% <70.76%> (+1.03%)`	⬆️
crypto/sr25519/pubkey.go	`43.47% <0.00%> (-8.70%)`	⬇️
privval/signer_listener_endpoint.go	`80.00% <0.00%> (-7.06%)`	⬇️
privval/socket_listeners.go	`78.72% <0.00%> (-4.26%)`	⬇️
privval/secret_connection.go	`72.68% <0.00%> (-3.61%)`	⬇️
mempool/reactor.go	`80.83% <0.00%> (-3.34%)`	⬇️
blockchain/v2/reactor.go	`32.73% <0.00%> (-2.88%)`	⬇️
consensus/reactor.go	`74.33% <0.00%> (-2.27%)`	⬇️
p2p/transport_mconn.go	`42.01% <0.00%> (-1.96%)`	⬇️
statesync/snapshots.go	`91.59% <0.00%> (-1.69%)`	⬇️
... and 12 more

erikgrinaker

This code has some issues that I think follows from using a suboptimal key structure. If I'm understanding this correctly, there's nothing separating event keys from height keys, such that they're mixed together and we have to do weird things like skip erroring entries while we're iterating over them if they don't have the expected format.

I think it might make more sense to have unique prefixes for the two different key types such that we'll never encounter a different key type while iterating over one type. Then we can simply error on any unexpected key formats. Or is there an actual reason for mixing them together, like we want to query multiple key types during a single iterator pass?

state/txindex/kv/kv.go

cmwaters · 2021-01-07T11:43:31Z

The height key is, in a way, a type of event key with a defined composite prefix of "tx.height" and a value of it's height (hence why we have height repeated twice - once as a string and another as an int64). All other event keys are chosen by the application and could have for example a composite prefix of "account.name" and a value of "Callum". I was thinking that because tendermint already reserves the composite prefix "tx.height" that we want to check that the application doesn't also use it as an event.

Other than that, the only place where keys could get shuffled together is with the hash key which has no prefix to it. For example if the hash of a tx just so happened to start with "tx.height" it would get inserted somewhere amongst the tx.height section. I was thinking we could add a "tx.hash" prefix to the hash key and reserve this composite prefix as well.

erikgrinaker · 2021-01-07T13:07:57Z

The height key is, in a way, a type of event key with a defined composite prefix of "tx.height" and a value of it's height (hence why we have height repeated twice - once as a string and another as an int64).

Got it. Any reason for handling it as if it was some other key then? Can't we just have one key type?

Also, storing it as a string prevents us from doing range scans on it. Not sure if that's relevant, just saying (e.g. if someone queries tx.height < 10 we'll have to scan all heights rather than just the ones below 10).

the only place where keys could get shuffled together is with the hash key which has no prefix to it. For example if the hash of a tx just so happened to start with "tx.height" it would get inserted somewhere amongst the tx.height section. I was thinking we could add a "tx.hash" prefix to the hash key and reserve this composite prefix as well.

That might make sense. In that case, shouldn't this be reserved, specified, and handled in the same place as e.g. tx.height is? Which I'm presuming is outside of the kv package?

cmwaters · 2021-01-07T15:52:16Z

Can't we just have one key type?

I've sort of already made it one key. There is a function called key() that makes keys of the following format string/string/int64/uint32. Both the keyForEvent and keyForHeight are utility wrapper functions that generate keys of this format.

I guess the problem with combining height with the event key and forcing it to have the same format is, as you say, at the loss of more performant range operations. Note that this is just for queries where tx.height is the only condition. For example say the query was account.name = callum AND tx.height > 10 then we would still perform a range query with the prefix as "account.name" the value as "callum" and then the height which comes next in the key (at least that's what I think we would do).

That might make sense. In that case, shouldn't this be reserved, specified, and handled in the same place as e.g. tx.height is? Which I'm presuming is outside of the kv package?

Yup turns out we already have tx.hash prefix but it isn't reserved (neither is tx.height for that matter), i.e. I could create an event called tx.height set a different value and the tx would also be saved there. Also the tx.hash key isn't actually used in the kvstore only as part of querying. I will change this.

I guess in general there is a lot of thought that can be spent on this and probably lots of fully featured libraries that do a good job of parsing queries and implementing secondary indexes. It's quite interesting to me but I think I should save properly delving into it for a later time so for now I would ideally like to focus on just tweaking a small part of the keys

…int into callum/tx-indexer-key

erikgrinaker · 2021-01-07T17:01:39Z

I've sort of already made it one key. There is a function called key() that makes keys of the following format string/string/int64/uint32. Both the keyForEvent and keyForHeight are utility wrapper functions that generate keys of this format.

Right, ok -- in that case we shouldn't really need all of the checks that skip erroring/invalid keys, since they should all be in a correct format. Makes sense to unify the tx.hash stuff then I think.

I guess in general there is a lot of thought that can be spent on this and probably lots of fully featured libraries that do a good job of parsing queries and implementing secondary indexes.

Yeah. Honestly, I don't really see any point in implementing this ourselves -- I'd just use SQLite and call it a day.

melekes

👍

state/txindex/kv/kv.go

cmwaters added 30 commits December 8, 2020 15:31

use big endian keys and iterator for pruning blocks

751715d

use big endian keys for state store and use an iterator for pruning

7ad07b2

fmt

276d898

wip

c422616

add encoding and pruning for state store

59b9129

add more tests to state store

af1ece8

remove test panic

8289072

restore test

a80b564

fix issue with iterator and batch writing

a0001c8

remove recursion

b5b2da6

add a few more tests

a34a17c

implement key encoding with block store and rewrite iterator

27d00dc

remove unnecessary exported test functions

7bdcdb6

remove block state store in place of iterator

f378973

fix block store tests

e8019f4

clean up formatting and errors

5292c0b

switch evidence db encoding

f8d77c8

change light client key encoding (and ensure tests pass)

6bc04ae

clean up

751f83a

Merge branch 'master' into callum/key-encoding

c78186d

go fmt

eef7f31

fix panicking test and separate key prefix bytes

5486a06

Merge branch 'master' into callum/key-encoding

dff9024

Merge branch 'master' into callum/key-encoding

4f093e9

split out iterator changes

56b6ba1

clean up broken code and return errors when parsing keys

89d823d

use different prefix for size in light db

681bb0f

Merge branch 'master' into callum/key-encoding

eb09ae3

remove unused decode function

b844bdb

use int64 for db prefixes

81d4568

cmwaters added 3 commits January 5, 2021 13:39

update changelog

41004d5

modify changelog

e4c9e70

use different field separator for tx indexer keys

10e02c2

cmwaters commented Jan 5, 2021

View reviewed changes

state/txindex/kv/kv.go Show resolved Hide resolved

cmwaters commented Jan 5, 2021

View reviewed changes

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

Base automatically changed from callum/key-encoding to master January 5, 2021 15:53

cmwaters added 2 commits January 6, 2021 15:25

clean up and fix bug with key height

66f9c07

add a few more test cases

ecff549

cmwaters commented Jan 6, 2021

View reviewed changes

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

Merge branch 'master' into callum/tx-indexer-key

26dec5d

cmwaters marked this pull request as ready for review January 6, 2021 14:37

cmwaters requested review from ebuchman, melekes and tessr as code owners January 6, 2021 14:37

erikgrinaker reviewed Jan 6, 2021

View reviewed changes

state/txindex/kv/kv.go Show resolved Hide resolved

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

cmwaters added 4 commits January 7, 2021 17:18

reserve tx hash key

5ae9d7f

Merge branch 'callum/tx-indexer-key' of github.com:tendermint/tenderm…

ba9dcdd

…int into callum/tx-indexer-key

Merge branch 'master' into callum/tx-indexer-key

0bc4adc

handle errors

7e2951e

melekes reviewed Jan 11, 2021

View reviewed changes

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

state/txindex/kv/kv.go Outdated Show resolved Hide resolved

return error when clashing with reserved key

8d6f126

melekes approved these changes Jan 11, 2021

View reviewed changes

Merge branch 'master' into callum/tx-indexer-key

9fba99d

cmwaters merged commit 5b698ed into master Jan 12, 2021

cmwaters deleted the callum/tx-indexer-key branch January 12, 2021 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tx indexer: use different field separator for keys#5865

tx indexer: use different field separator for keys#5865
cmwaters merged 46 commits intomasterfrom
callum/tx-indexer-key

cmwaters commented Jan 5, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2021 •

edited

Loading

Uh oh!

erikgrinaker left a comment

Uh oh!

Uh oh!

Uh oh!

cmwaters commented Jan 7, 2021

Uh oh!

erikgrinaker commented Jan 7, 2021 •

edited

Loading

Uh oh!

cmwaters commented Jan 7, 2021

Uh oh!

erikgrinaker commented Jan 7, 2021

Uh oh!

melekes left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cmwaters commented Jan 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cmwaters commented Jan 7, 2021

Uh oh!

erikgrinaker commented Jan 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmwaters commented Jan 7, 2021

Uh oh!

erikgrinaker commented Jan 7, 2021

Uh oh!

melekes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cmwaters commented Jan 5, 2021 •

edited

Loading

codecov bot commented Jan 6, 2021 •

edited

Loading

erikgrinaker commented Jan 7, 2021 •

edited

Loading