Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2314 +/- ##
===========================================
+ Coverage 60.94% 60.98% +0.03%
===========================================
Files 197 197
Lines 16278 16278
===========================================
+ Hits 9921 9927 +6
+ Misses 5492 5486 -6
Partials 865 865
|
|
|
||
| Instead of returning a list of tags, return a list of events, where | ||
| each event is a list of tags. This way we naturally capture the concept of | ||
| multiple events happening during a single ABCI message. |
There was a problem hiding this comment.
If I understand correctly, this change would mean that the way a user interacts with tags (efficiently) is you have to search by event, you can't search by tag. (As searching by tag would be (number of tags) * log(N) hashes for all the merkle proofs) At that point, tags don't even feel useful, just search everything by event. You couldn't do proofs of exclusion either, unless we had a separate store per tag.
There was a problem hiding this comment.
I don't follow, can you further expand on your complexity calculation?
As I understand this, a multi-Msg transactions with three events would be indexed almost the same as three separate transactions with one event each (except for three separate transaction hashes in the latter case), and the search complexity would be equivalent.
There was a problem hiding this comment.
Sorry I made a typo in my previous equation (I meant number of events).
Suppose there are 10 events which have the slashing tag. I want all slashing events. In the proposed scheme I have 10 merkle proofs (one for each event) = 10 log (Number of events) hashes plus the 10 events. (Theres no guarantee that the slashing events will be nearby in the tree)
If instead we kept the merkle tree by tags, it would be one merkle proof per tag which seems alot more scalable to me. (We retain proofs of exclusion as well) As an additional optimization, we could even have the list of events merkelized under the tag. (*This is a tricky thing though. You will lose proofs of exclusion if you may want to sub-search on two different properties) You would then only return the events that met all queries. The complexity would then be the following:
(Number of tags searching for) * log(Number of distinct tags) + log (number of events per tag) hashes + only events that match all queries' values.
There was a problem hiding this comment.
Actually just realized, having a list of events makes us lose proofs of exclusion as well. (As you can't prove that no other event has a given tag, efficiently)
There was a problem hiding this comment.
I wonder what the expected query workload is like here - maybe it is as you suggest, querying many events with the same tag, but it also could be more random-access, or searching by other tags like account address. I would also expect a lot of combined queries - "slashing events for myValidator".
Why does multiple-events-per-transaction change whether or not you get proofs of exclusion? Seems like that should be possible in either case, but only with one query key/value - prove no events under the key/value path queried, vs. no transaction hashes.
There was a problem hiding this comment.
I don't think we need to optimize for the case where people just care about themselves. Though what I outlined here: #2314 (comment) has low bandwidth in this case anyway. (One merkle proof + a standard range proof OR one merkle proof plus entire validator=validatorA subtree)
It feels more natural to me to optimize for the case where the optimization can cause a large performance difference. (e.g. all events tagged governance)
Why does multiple-events-per-transaction change whether or not you get proofs of exclusion?
I'm confused by this sentence. If this is in regards to Actually just realized, having a list of events makes us lose proofs of exclusion as well. (As you can't prove that no other event has a given tag, efficiently), what I mean by this is if I search "give me everything with the tag slashing" I don't have evidence that you gave me everything, since there isn't a value that contains all slashing elements, or a sequence of values in the tree that are all slashing to do a proof of exclusion on.
| where each tag refers to a different property of the event, like the sending and receiving account addresses. | ||
|
|
||
| Since there is only one list of tags, recording data for multiple such events in | ||
| a single Check/DeliverTx/Begin/EndBlock must be done using prefixes in the key |
There was a problem hiding this comment.
Maybe I'm missing something fundamentally, but why is there a need for prefixing. Shouldn't the order for trying to search for things by tag be: Merkle tree of tags, key: tag bytes, value: list of events, then you get a merkle proof along with the entire tag set for that block?
i.e. suppose my key is "slashing", then the value for that tag could be "[Validator 0 slashed for downtime, other relevant details, Validator 2 slashed for not voting on proposal X]"
There was a problem hiding this comment.
If you do it that way, you can't structure the value data. I might want to search for "all events where action = slashing and reason = downtime and validator = myValidator", for example, which isn't possible with this format (as I understand TM's current indexing model).
There was a problem hiding this comment.
Why can't you just query for all txs validator = myValidator and slashing filtering anything else client side?
There was a problem hiding this comment.
Because multiple validators can be slashed per abci.RequestBeginBlock - so you need to put them under different keys (or put them as a list under one key and somehow associate other parameters such as slash amount and reason).
There was a problem hiding this comment.
Is is possible to define multiple tags with different values: ['validator=myValidator','validator=yourValidator']?
There was a problem hiding this comment.
@cwgoes I thought slashing is an extra event not a label on another tx. I miss understood that.
There was a problem hiding this comment.
Slashing doesn't happen during txs but does happen during BeginBlock, before the txs. BeginBlock returns tags so that clients can be notified of what "events" happened during the BeginBlock. If multiple validators get slashed during a single BeginBlock, we need a way to disentangle those multiple events (ie. each one has a validator address, an amount slashed, perhaps a reason for slashing, etc.)
There was a problem hiding this comment.
hmm, and if we do create a slashing tx for each slashing event that is then incorporated in a block? this way we don't introduce another concept and can leave the labeling logic as is. this is probably not possible, just brainstorming here.
There was a problem hiding this comment.
In theory that's possible, but would require significant change to the state machine, and is contrary to how we designed tendermint (ie. to auto detect and include evidence in blocks as a special type of data, distinct from txs).
| They are key-value pairs that can be used to index transactions. | ||
|
|
||
| Currently, ABCI messages return a list of tags to describe an | ||
| "event" that took place during the Check/DeliverTx/Begin/EndBlock, |
There was a problem hiding this comment.
how do we currently support a sequence of events?
There was a problem hiding this comment.
prefixing (1_address, 2_address)
| where each tag refers to a different property of the event, like the sending and receiving account addresses. | ||
|
|
||
| Since there is only one list of tags, recording data for multiple such events in | ||
| a single Check/DeliverTx/Begin/EndBlock must be done using prefixes in the key |
There was a problem hiding this comment.
What's bad about prefixing?
There was a problem hiding this comment.
requires extra convention in the keys. not necessarily a bad thing, but could be nicer to have native support for the concept
There was a problem hiding this comment.
If we can't do with list of key: value strings, we should consider bringing in complex types (there was a discussion about this some time ago with Jae where we decided to stick with strings as raw bytes).
If we do so, we can add JSON type (e.g. key: "Joth", value_type: JSON, value: {"address": "X", "denom": "B"}). Then it would be super easy to support multiple events per DeliverTx.
There was a problem hiding this comment.
there was a discussion about this some time ago with Jae where we decided to stick with strings as raw bytes
Do you know if the reasoning for this was documented anywhere? Didn't we explicitly get rid of types here at one point?
we can add JSON type
This is cool, but I think it means we'd need to unmarshal the JSON and index the keys in there too if we want to support eg. a query for "validator X slashed". More likely we need to really start thinking about the "indexing outside of tendermint core" story.
cwgoes
left a comment
There was a problem hiding this comment.
Thanks for writing this up @ebuchman.
Commented, replied to your above query at #1859 (comment).
| where each tag refers to a different property of the event, like the sending and receiving account addresses. | ||
|
|
||
| Since there is only one list of tags, recording data for multiple such events in | ||
| a single Check/DeliverTx/Begin/EndBlock must be done using prefixes in the key |
There was a problem hiding this comment.
If you do it that way, you can't structure the value data. I might want to search for "all events where action = slashing and reason = downtime and validator = myValidator", for example, which isn't possible with this format (as I understand TM's current indexing model).
| a single Check/DeliverTx/Begin/EndBlock must be done using prefixes in the key | ||
| space. | ||
|
|
||
| TODO: brief description of how the indexing works |
There was a problem hiding this comment.
It should work no differently than current indexing - just where multiple events can be indexed per transaction, so the transaction hash can be looked up from different events.
|
|
||
| Instead of returning a list of tags, return a list of events, where | ||
| each event is a list of tags. This way we naturally capture the concept of | ||
| multiple events happening during a single ABCI message. |
There was a problem hiding this comment.
I don't follow, can you further expand on your complexity calculation?
As I understand this, a multi-Msg transactions with three events would be indexed almost the same as three separate transactions with one event each (except for three separate transaction hashes in the latter case), and the search complexity would be equivalent.
|
|
||
| ### Negative | ||
|
|
||
| - More complex query syntax |
There was a problem hiding this comment.
I don't think the query syntax needs to change. We don't need to allow searching for transactions by combinations of multiple events - "slashed AND jailed" - they can just be treated independently, like multiple separate transactions would be at the moment.
There was a problem hiding this comment.
Maybe I'm misunderstanding, the logical and would say only return events that fit both slashing and the jailed tags. We still want that right? What we don't need is to support logical or imo. (E.g. all queries that are staking or are governance, as these can be sent independently) we want the former case to mitigate bandwidth for lite clients
There was a problem hiding this comment.
I'm talking about finding one transaction with two events, not one event with two tags - the latter we definitely do want.
|
|
||
| ## Decision | ||
|
|
||
| Include a `string code_space` in all ABCI messages that have a `code`. |
There was a problem hiding this comment.
Should it be a codespace string, or just an error string, and code changes to essentially a boolean (pass/fail) or a small set of error codes likely to be meaningful for many modules (a la HTTP XYZ)?
There was a problem hiding this comment.
Good question. I do think code needs more than binary output, and if its going to be a "small set of error codes", I can see a good argument for it just being an integer as is - we should try to create a standard for the lower value codes, but there's plenty of room for folks to represent more domain specific issues.
There was a previous discussion about introducing error in tendermint/abci#165
It would be cool to see a proposal for how the code system here could better align with HTTP.
|
|
||
| - Another field in the response needs to be accounted for | ||
| - Some redundancy with `code` field | ||
| - May encourage more error/code type info to move to the `codespace` string, which |
There was a problem hiding this comment.
Can light clients not verify the codespace (are we not putting it in state)?
There was a problem hiding this comment.
I purposefully left how it gets hashed out of this ADR as I think that's a larger design problem that also needs to be figured out for tags: #1007
|
I think it's clear we want to move forward with codespace, but the events is still under consideration. Let's merge this and open a new PR for more discussion about the events |
|
I think what we have now is good... an event should definitely have some identifying type information... so we can pull that one out. That leaves the remaining "value" of an event, which we don't know how to encode, so we can just keep that as a []byte or string. Basically, I think we should have a tuple of bytes/strings here, because we only know 1 think about what the value should be... namely that it should have a type identifier, or a label of some sort. |
Wrote up ADRs for the ABCI codespace (#1858) and events (#1859).
We should move forward with adding the codespace (very simple), but we may want to take our time with the events given the downstream impact on clients - not entirely sure it's launch critical, and the ADR has some TODOs around how indexing will work. Original PR for the events is #1803 but it would need rebasing, review, testing, updates for clients, etc. @cwgoes do you think we can forgo this for launch?
NOTE #1859 (comment) (suggested priority for list of events is to track multiple slashing events in a single block ...)