KV Store / Archival indexer refactor + schema changes#25143
KV Store / Archival indexer refactor + schema changes#25143nickvikeras merged 7 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
ae2c894 to
db1e3d7
Compare
db1e3d7 to
267fb1f
Compare
267fb1f to
eb85d94
Compare
eb85d94 to
9191f7c
Compare
9191f7c to
e3d7ac3
Compare
e3d7ac3 to
03b3c66
Compare
03b3c66 to
80f4ce0
Compare
80f4ce0 to
de87ac2
Compare
de87ac2 to
cb891cc
Compare
a20c6df to
8e389c0
Compare
8e389c0 to
7e8d84f
Compare
7e8d84f to
4ef6c43
Compare
4ef6c43 to
7dae93e
Compare
7dae93e to
eb06114
Compare
eb06114 to
2e9d292
Compare
Add three new pipelines (EpochStart, EpochEnd, ObjectTypes) and extend the transaction schema with separate data/signatures columns and balance_changes/unchanged_loaded fields. Decode logic is unchanged and will be updated in a follow-up.
crates/sui-kvstore/src/lib.rs
Outdated
| pub signatures: AuthorityStrongQuorumSignInfo, | ||
| } | ||
|
|
||
| #[derive(Clone, Debug, Serialize, Deserialize)] |
There was a problem hiding this comment.
Do we actually serialize/deserialize these types you've defined here? If we don't (i think we just serialize/deserialize the individual parts of these structs) then we should remove the impls (and maybe add a comment indicating that we shouldn't implement serialize/deserialize on these types)
There was a problem hiding this comment.
I didn't add these types, I just reorganized the files to match the indexing & RPC's style preferences when refactoring. But I can try deleting the serde stuff and see if it still compiles. I don't think we actually serialize/deserialize it so you are right it shouldn't be there.
| pub end_timestamp_ms: Option<u64>, | ||
| pub end_checkpoint: Option<u64>, |
There was a problem hiding this comment.
For indexing these shouldn't ever be None right?
There was a problem hiding this comment.
Yeah good catch, there is no reason these need to be Options.
There was a problem hiding this comment.
Actually, if we are storing each field as it's own column in bigtable, I think we do want these to be Options so that we can query just a subset of the columns.
There was a problem hiding this comment.
yep i agree, my comment only makes sense if we keep the current schema
| pub const START: &str = "start"; | ||
| pub const END: &str = "end"; |
There was a problem hiding this comment.
Do we want to break out the individual components into their own columns vs having a start and end column which data that can't be expanded and evolved in the future?
There was a problem hiding this comment.
Ah yeah yeah I can split it up. My brain for some reason still hasn't fully registered that BCS isn't extensible. I was just looking at the query in the rpc API and saw the data was small and wasn't sure if it made sense to split it up into multiple columns to read separately.
| let timestamp_ms = checkpoint.summary.timestamp_ms; | ||
| let mut entries = Vec::with_capacity(checkpoint.object_set.len()); | ||
|
|
||
| for object in checkpoint.object_set.iter() { |
There was a problem hiding this comment.
doing this we'll end up with some redundant writes since this includes input and loaded unchanged objects as well. But this should be effectively idempotent so maybe not worth the optimization yet?
There was a problem hiding this comment.
The efficient implementation of this is fairly simple:
sui/crates/sui-indexer-alt/src/handlers/kv_objects.rs
Lines 25 to 52 in 965fc34
(the kv_objects pipeline also records a sentinel for deleted objects, but that can be ignored).
| .map(tables::transactions::encode_key) | ||
| .collect(), | ||
| Some(RowFilter { | ||
| filter: Some(Filter::ColumnQualifierRegexFilter( |
There was a problem hiding this comment.
is it always a regex or is there a way to just provide exact columns?
amnn
left a comment
There was a problem hiding this comment.
Apologies for the post-land comments -- two main things are:
- schema alignment on end-of-epoch data so that GraphQL can use this data-set.
- not introducing a new place where we depend on an object's type not changing.
There was a problem hiding this comment.
The Postgres handler for epoch end information is indexing a lot more stuff, which we would need in order to switch over to the archival store instead of relying on this index in GraphQL:
sui/crates/sui-indexer-alt-schema/src/schema.rs
Lines 75 to 93 in 965fc34
There was a problem hiding this comment.
This works well schema-wise for GraphQL, but does it record a record for genesis (epoch 0)?
There was a problem hiding this comment.
Yeah it does, confirmed the record is there in my test db which started out empty and ran from genesis
There was a problem hiding this comment.
IIUC, this pipeline assumes that an object's type cannot change once it is created, we make this assumption in one other place -- the obj_info pipeline of sui-indexer-alt, but this is one of the reasons why we're getting rid of this pipeline:
This assumption prevents us from allowing people to reclaim derived object IDs (because it would allow the same UID to be re-used under a different type).
For that reason, we should not introduce new pipelines that rely on this assumption. In this case, I think the fix is pretty simple: Adapt the pipeline to check whether the object was created or it was mutated and the mutation involves a type change.
Note that even today, dynamic fields may fail this test: a dynamic field's ID is derived from the parent ID, name type and name content. The value type and content don't impact the ID, which means a transaction can modify a field to change its value's type and it will be treated as a field mutation. If you query the type of the Field<..., ...> object corresponding to the dynamic field, you will then get a stale response.
| let timestamp_ms = checkpoint.summary.timestamp_ms; | ||
| let mut entries = Vec::with_capacity(checkpoint.object_set.len()); | ||
|
|
||
| for object in checkpoint.object_set.iter() { |
There was a problem hiding this comment.
The efficient implementation of this is fairly simple:
sui/crates/sui-indexer-alt/src/handlers/kv_objects.rs
Lines 25 to 52 in 965fc34
(the kv_objects pipeline also records a sentinel for deleted objects, but that can be ignored).
| pub use crate::handlers::set_max_mutations; | ||
|
|
||
| /// All pipeline names registered by the indexer. Single source of truth used for: | ||
| /// - Pipeline registration in `BigTableIndexer::new()` |
There was a problem hiding this comment.
This at least doesn't seem to be true (this const is not used for pipeline registration) is it needed, and if so, is it meant to be here in the file? It's flanked on either side by imports.
Description
This can be deployed without breaking any readers. The reader updates to read these new columns/watermarks will follow in a separate PR, and then we can finally stop writing and delete the old columns.
Test plan