KV Store / Archival indexer refactor + schema changes by nickvikeras · Pull Request #25143 · MystenLabs/sui

nickvikeras · 2026-01-28T18:11:54Z

Description

Splitting into pipeline-per-table
Making (backward-compatible) schema changes
- Write txn signature and data as separate columns (keep writing old column)
- Write epoch start/end as separate columns (keep writing old column)
- New object type table (this data is already in object table, but this allows us to avoid loading the entire object for rendering type information in grpc/graphql's txn apis)
- Continuing to write old watermark (as min of all pipeline watermarks)
Use Bytes instead of Vec in the prost-generated code, because the framework sort of forces you to clone all of your data in the commit path to deal with retries.

This can be deployed without breaking any readers. The reader updates to read these new columns/watermarks will follow in a separate PR, and then we can finally stop writing and delete the old columns.

Test plan

New test that uses a mocked grpc server to test partial write failures.
New emulator-based test (had to install the gcloud cli on the CI container to make this work).

vercel · 2026-01-28T18:12:01Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
sui-docs	Ready	Preview, Comment	Feb 3, 2026 9:00pm

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
multisig-toolkit	Ignored	Preview	Feb 3, 2026 9:00pm
sui-kiosk	Ignored	Preview	Feb 3, 2026 9:00pm

Add three new pipelines (EpochStart, EpochEnd, ObjectTypes) and extend the transaction schema with separate data/signatures columns and balance_changes/unchanged_loaded fields. Decode logic is unchanged and will be updated in a follow-up.

crates/sui-kvstore/src/lib.rs

bmwill · 2026-02-03T15:05:21Z

crates/sui-kvstore/src/lib.rs

+    pub signatures: AuthorityStrongQuorumSignInfo,
+}
+
+#[derive(Clone, Debug, Serialize, Deserialize)]


Do we actually serialize/deserialize these types you've defined here? If we don't (i think we just serialize/deserialize the individual parts of these structs) then we should remove the impls (and maybe add a comment indicating that we shouldn't implement serialize/deserialize on these types)

I didn't add these types, I just reorganized the files to match the indexing & RPC's style preferences when refactoring. But I can try deleting the serde stuff and see if it still compiles. I don't think we actually serialize/deserialize it so you are right it shouldn't be there.

bmwill · 2026-02-03T15:13:40Z

crates/sui-kvstore/src/lib.rs

+    pub end_timestamp_ms: Option<u64>,
+    pub end_checkpoint: Option<u64>,


For indexing these shouldn't ever be None right?

Yeah good catch, there is no reason these need to be Options.

Actually, if we are storing each field as it's own column in bigtable, I think we do want these to be Options so that we can query just a subset of the columns.

yep i agree, my comment only makes sense if we keep the current schema

crates/sui-kvstore/src/lib.rs

bmwill · 2026-02-03T15:15:39Z

crates/sui-kvstore/src/tables/epochs.rs

+    pub const START: &str = "start";
+    pub const END: &str = "end";


Do we want to break out the individual components into their own columns vs having a start and end column which data that can't be expanded and evolved in the future?

Ah yeah yeah I can split it up. My brain for some reason still hasn't fully registered that BCS isn't extensible. I was just looking at the query in the rpc API and saw the data was small and wasn't sure if it made sense to split it up into multiple columns to read separately.

crates/sui-kvstore/src/tables/transactions.rs

bmwill · 2026-02-03T15:27:42Z

crates/sui-kvstore/src/handlers/objects.rs

+        let timestamp_ms = checkpoint.summary.timestamp_ms;
+        let mut entries = Vec::with_capacity(checkpoint.object_set.len());
+
+        for object in checkpoint.object_set.iter() {


doing this we'll end up with some redundant writes since this includes input and loaded unchanged objects as well. But this should be effectively idempotent so maybe not worth the optimization yet?

The efficient implementation of this is fairly simple:

sui/crates/sui-indexer-alt/src/handlers/kv_objects.rs

Lines 25 to 52 in 965fc34

let deleted_objects = checkpoint

.eventually_removed_object_refs_post_version()

.into_iter()

.map(|(id, version, _)| {

Ok(StoredObject {

object_id: id.to_vec(),

object_version: version.value() as i64,

serialized_object: None,

})

});

let created_objects = checkpoint.transactions.iter().flat_map(|txn| {

txn.output_objects(&checkpoint.object_set).map(|o| {

let id = o.id();

let version = o.version();

Ok(StoredObject {

object_id: id.to_vec(),

object_version: version.value() as i64,

serialized_object: Some(bcs::to_bytes(o).with_context(|| {

format!("Serializing object {id} version {}", version.value())

})?),

})

})

});

deleted_objects

.chain(created_objects)

.collect::<Result<Vec<_>, _>>()

(the kv_objects pipeline also records a sentinel for deleted objects, but that can be ignored).

bmwill · 2026-02-03T20:02:32Z

crates/sui-kvstore/src/bigtable/client.rs

+                .map(tables::transactions::encode_key)
+                .collect(),
+            Some(RowFilter {
+                filter: Some(Filter::ColumnQualifierRegexFilter(


is it always a regex or is there a way to just provide exact columns?

amnn

Apologies for the post-land comments -- two main things are:

schema alignment on end-of-epoch data so that GraphQL can use this data-set.
not introducing a new place where we depend on an object's type not changing.

amnn · 2026-02-06T14:24:23Z

crates/sui-kvstore/src/handlers/epochs_end.rs

The Postgres handler for epoch end information is indexing a lot more stuff, which we would need in order to switch over to the archival store instead of relying on this index in GraphQL:

sui/crates/sui-indexer-alt-schema/src/schema.rs

Lines 75 to 93 in 965fc34

diesel::table! {

kv_epoch_ends (epoch) {

epoch -> Int8,

cp_hi -> Int8,

tx_hi -> Int8,

end_timestamp_ms -> Int8,

safe_mode -> Bool,

total_stake -> Nullable<Int8>,

storage_fund_balance -> Nullable<Int8>,

storage_fund_reinvestment -> Nullable<Int8>,

storage_charge -> Nullable<Int8>,

storage_rebate -> Nullable<Int8>,

stake_subsidy_amount -> Nullable<Int8>,

total_gas_fees -> Nullable<Int8>,

total_stake_rewards_distributed -> Nullable<Int8>,

leftover_storage_fund_inflow -> Nullable<Int8>,

epoch_commitments -> Bytea,

}

}

amnn · 2026-02-06T14:25:48Z

crates/sui-kvstore/src/handlers/epochs_start.rs

This works well schema-wise for GraphQL, but does it record a record for genesis (epoch 0)?

Yeah it does, confirmed the record is there in my test db which started out empty and ran from genesis

amnn · 2026-02-06T14:32:54Z

crates/sui-kvstore/src/handlers/object_types.rs

IIUC, this pipeline assumes that an object's type cannot change once it is created, we make this assumption in one other place -- the obj_info pipeline of sui-indexer-alt, but this is one of the reasons why we're getting rid of this pipeline:

This assumption prevents us from allowing people to reclaim derived object IDs (because it would allow the same UID to be re-used under a different type).

For that reason, we should not introduce new pipelines that rely on this assumption. In this case, I think the fix is pretty simple: Adapt the pipeline to check whether the object was created or it was mutated and the mutation involves a type change.

Note that even today, dynamic fields may fail this test: a dynamic field's ID is derived from the parent ID, name type and name content. The value type and content don't impact the ID, which means a transaction can modify a field to change its value's type and it will be treated as a field mutation. If you query the type of the Field<..., ...> object corresponding to the dynamic field, you will then get a stale response.

amnn · 2026-02-06T14:35:53Z

crates/sui-kvstore/src/handlers/objects.rs

+        let timestamp_ms = checkpoint.summary.timestamp_ms;
+        let mut entries = Vec::with_capacity(checkpoint.object_set.len());
+
+        for object in checkpoint.object_set.iter() {


The efficient implementation of this is fairly simple:

sui/crates/sui-indexer-alt/src/handlers/kv_objects.rs

Lines 25 to 52 in 965fc34

let deleted_objects = checkpoint

.eventually_removed_object_refs_post_version()

.into_iter()

.map(|(id, version, _)| {

Ok(StoredObject {

object_id: id.to_vec(),

object_version: version.value() as i64,

serialized_object: None,

})

});

let created_objects = checkpoint.transactions.iter().flat_map(|txn| {

txn.output_objects(&checkpoint.object_set).map(|o| {

let id = o.id();

let version = o.version();

Ok(StoredObject {

object_id: id.to_vec(),

object_version: version.value() as i64,

serialized_object: Some(bcs::to_bytes(o).with_context(|| {

format!("Serializing object {id} version {}", version.value())

})?),

})

})

});

deleted_objects

.chain(created_objects)

.collect::<Result<Vec<_>, _>>()

(the kv_objects pipeline also records a sentinel for deleted objects, but that can be ignored).

amnn · 2026-02-06T14:40:42Z

crates/sui-kvstore/src/lib.rs

+pub use crate::handlers::set_max_mutations;
+
+/// All pipeline names registered by the indexer. Single source of truth used for:
+/// - Pipeline registration in `BigTableIndexer::new()`


This at least doesn't seem to be true (this const is not used for pipeline registration) is it needed, and if so, is it meant to be here in the file? It's flanked on either side by imports.

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 18:11 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 18:13 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from ae2c894 to db1e3d7 Compare January 28, 2026 19:32

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 19:32 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 19:34 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from db1e3d7 to 267fb1f Compare January 28, 2026 19:39

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 19:39 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 19:41 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 267fb1f to eb85d94 Compare January 28, 2026 22:08

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 22:08 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 22:10 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from eb85d94 to 9191f7c Compare January 28, 2026 22:33

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 22:33 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 22:36 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 9191f7c to e3d7ac3 Compare January 28, 2026 23:31

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 28, 2026 23:31 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 28, 2026 23:33 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from e3d7ac3 to 03b3c66 Compare January 29, 2026 00:00

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 00:00 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 00:02 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 03b3c66 to 80f4ce0 Compare January 29, 2026 00:18

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 00:19 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 00:20 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 80f4ce0 to de87ac2 Compare January 29, 2026 14:52

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 14:52 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 14:54 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from de87ac2 to cb891cc Compare January 29, 2026 15:01

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 15:01 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 15:02 View deployment

vercel bot deployed to Preview – sui-docs January 29, 2026 21:17 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from a20c6df to 8e389c0 Compare January 29, 2026 21:18

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 21:18 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 21:22 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 8e389c0 to 7e8d84f Compare January 29, 2026 21:52

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 21:52 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 21:57 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 7e8d84f to 4ef6c43 Compare January 29, 2026 22:42

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 29, 2026 22:42 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 29, 2026 22:46 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from 4ef6c43 to 7dae93e Compare January 30, 2026 00:11

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 30, 2026 00:11 — with GitHub Actions Inactive

nickvikeras force-pushed the nickv/kv-pipelines branch from 7dae93e to eb06114 Compare January 30, 2026 00:12

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 30, 2026 00:12 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 30, 2026 00:15 View deployment

nickvikeras force-pushed the nickv/kv-pipelines branch from eb06114 to 2e9d292 Compare January 30, 2026 01:21

nickvikeras temporarily deployed to sui-typescript-aws-kms-test-env January 30, 2026 01:21 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 30, 2026 01:25 View deployment

nickvikeras mentioned this pull request Jan 30, 2026

Update kv-store readers to read the new schema with fallback #25191

Merged

8 tasks

nickvikeras added 4 commits February 2, 2026 09:48

KV Store Indexer multi-pipeline refactor

141a6f9

Add flag to control writing legacy data

2c9e8b6

Fix escaping

268dbac

bmwill reviewed Feb 3, 2026

View reviewed changes

nickvikeras added 2 commits February 3, 2026 11:05

remove unnecessary serdes

5494e36

Clean up epoch storage

652ea10

bmwill reviewed Feb 3, 2026

View reviewed changes

bmwill approved these changes Feb 3, 2026

View reviewed changes

optimize object type indexing

f06455e

amnn reviewed Feb 6, 2026

View reviewed changes

		pub end_timestamp_ms: Option<u64>,
		pub end_checkpoint: Option<u64>,

		pub const START: &str = "start";
		pub const END: &str = "end";

	let deleted_objects = checkpoint
	.eventually_removed_object_refs_post_version()
	.into_iter()
	.map(\|(id, version, _)\| {
	Ok(StoredObject {
	object_id: id.to_vec(),
	object_version: version.value() as i64,
	serialized_object: None,
	})
	});

	let created_objects = checkpoint.transactions.iter().flat_map(\|txn\| {
	txn.output_objects(&checkpoint.object_set).map(\|o\| {
	let id = o.id();
	let version = o.version();
	Ok(StoredObject {
	object_id: id.to_vec(),
	object_version: version.value() as i64,
	serialized_object: Some(bcs::to_bytes(o).with_context(\|\| {
	format!("Serializing object {id} version {}", version.value())
	})?),
	})
	})
	});

	deleted_objects
	.chain(created_objects)
	.collect::<Result<Vec<_>, _>>()

	diesel::table! {
	kv_epoch_ends (epoch) {
	epoch -> Int8,
	cp_hi -> Int8,
	tx_hi -> Int8,
	end_timestamp_ms -> Int8,
	safe_mode -> Bool,
	total_stake -> Nullable<Int8>,
	storage_fund_balance -> Nullable<Int8>,
	storage_fund_reinvestment -> Nullable<Int8>,
	storage_charge -> Nullable<Int8>,
	storage_rebate -> Nullable<Int8>,
	stake_subsidy_amount -> Nullable<Int8>,
	total_gas_fees -> Nullable<Int8>,
	total_stake_rewards_distributed -> Nullable<Int8>,
	leftover_storage_fund_inflow -> Nullable<Int8>,
	epoch_commitments -> Bytea,
	}
	}

Conversation

nickvikeras commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test plan

Uh oh!

vercel bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nickvikeras Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amnn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nickvikeras commented Jan 28, 2026 •

edited

Loading

vercel bot commented Jan 28, 2026 •

edited

Loading

nickvikeras Feb 3, 2026 •

edited

Loading