Skip to content

feat(smt): expand PartialSmt tracking definition #691

Merged
bobbinth merged 19 commits into0xMiden:nextfrom
reilabs:krushimir/partial-smt-tracking
Dec 26, 2025
Merged

feat(smt): expand PartialSmt tracking definition #691
bobbinth merged 19 commits into0xMiden:nextfrom
reilabs:krushimir/partial-smt-tracking

Conversation

@krushimir
Copy link
Copy Markdown
Collaborator

@krushimir krushimir commented Dec 4, 2025

Expands the PartialSmt tracking definition to allow updates to keys for which a valid merkle path can be computed, even if not explicitly added.

A key is now tracked if:

  • Explicitly added via add_path/add_proof, or
  • A merkle path can be computed and verified against the root

Uses early-bail optimization when verifying path consistency.

Closes #375.

@krushimir
Copy link
Copy Markdown
Collaborator Author

@PhilippGackstatter A couple of notes:

  1. The issue mentions decoupling PartialSmt from Smt. Let me know if that should be part of this PR or a separate one.
  2. No caching for read operations (get_value, open).

@krushimir krushimir force-pushed the krushimir/partial-smt-tracking branch from de1c993 to 357b930 Compare December 4, 2025 19:17
@PhilippGackstatter
Copy link
Copy Markdown
Contributor

The issue mentions decoupling PartialSmt from Smt. Let me know if that should be part of this PR or a separate one.

I think how we check for tracked leaves and the structure is actually somewhat separate, right? So in that case, refactoring this separately seems best.

The caching is probably hard to do anyway because it requires &mut self and when reading a value we only have &self, so maybe not needed.

Comment on lines +364 to +377
let mut current_hash = node_hash;

while index.depth() > 0 {
// Check consistency: does our computed hash match what the tree says?
if current_hash != self.0.get_node_hash(index) {
return None;
}

// Compute parent hash
let sibling_hash = self.0.get_node_hash(index.sibling());
let input = index.build_node(current_hash, sibling_hash);
index.move_up();
current_hash = Rpo256::merge(&input);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, but it also seems like quite a heavy check.

I was thinking about an alternative approach. The main idea is that we'll only ever encounter an untracked leaf that is fine to update, if the value of the key is empty.

Also, the assumption is that the InnerNodes that are explicitly present (e.g. not retrieved through Smt::get_inner_node, since that defaults to subtree roots) in the underlying Smt::inner_nodes are all consistent with the root. This should be the case for partial SMT.

Then, if the leaf is not present, we can check if the value for the key we want to update is empty. If we can prove it is, we can update it, not otherwise. We can check whether a value is empty by comparing against the empty subtree roots until the first non-empty subtree root, which should only require computing one hash and equality comparisons otherwise.

More concretely, take the visualization from #372 (review) as an example. Conceptually, we can assume that the value for the key-to-check is empty and attempt to prove it by comparing it against the actual tree. Let's say we check for node_4:

  • Compute:
    let leaf_idx = Smt::key_to_leaf_index(&key);
    let inner_node_idx = leaf_idx.index;
  • Check if the parent InnerNode of node_4 (= c) in the actual tree is identical to EmptySubtreeRoots::get_inner_node(SMT_DEPTH, inner_node_idx.parent().depth()). In this example, it is, so we continue.
  • Repeat to check that node c's parent is an empty subtree root. Again, it is.
  • Repeat and the check fails since g is not an empty subtree root. Attempt to compute InnerNode { left: node_e, right: node_f }.hash() and check if that matches g. We can do this because node_e is present in Smt::inner_nodes. It matches, and since we're at the root - we know node_4 is an empty value that is consistent with the rest of the tree. So we consider it tracked.

I'm not fully convinced this works in all cases, but I believe the above is correct. There seem to be a few edge cases, so I'm not sure overall if it's worth it. Especially, considering that we currently don't really have a use case for updating leaves that are not explicitly tracked. But may be interesting to think about.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the optimization suggestion!

After thinking further, it seems we can do this with zero hash computations.

Given that all explicit InnerNodes are already verified against the root, when we traverse up and hit a stored InnerNode, we just compare the empty-subtree root for that depth against the stored child hash.

Following your examples:
node_4 (tracked):
Walk up through empty nodes; at g we find a stored InnerNode with right = f_hash.
Compare empty_subtree_root == f_hash → matches (f is empty) -> tracked.
No need to compute hash(node_e, node_f).

node_2 (not tracked):
Walk up; at e we find a stored InnerNode with right = b_hash.
Compare empty_subtree_root == b_hash -> mismatch (b is non-empty because leaf 3 has data) -> not tracked.

I've implemented a test matching the visualization (local, not yet pushed): leaf 1 and leaf 3 have data, proof only for leaf 1.

Am I missing something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works, thanks for thinking about it further! You're right, we don't even need to hash anything. Very nice! Thinking about that way of checking now, it seems it would be low complexity and probably doesn't have many or any edge cases, so I think we can go ahead with that.

@krushimir krushimir force-pushed the krushimir/partial-smt-tracking branch from e7e78bd to b03c313 Compare December 5, 2025 13:11
@bobbinth
Copy link
Copy Markdown
Contributor

bobbinth commented Dec 7, 2025

Without having looked at the code, would it make sense to first refactor the PartialSmt struct to not be based on the Smt struct (as suggested in #375 (comment)) and only then update the definition of "tracked"? (or maybe that can happen naturally as a part of this refactoring)

@krushimir krushimir force-pushed the krushimir/partial-smt-tracking branch from b03c313 to 40f1430 Compare December 10, 2025 19:04
@krushimir
Copy link
Copy Markdown
Collaborator Author

krushimir commented Dec 10, 2025

Without having looked at the code, would it make sense to first refactor the PartialSmt struct to not be based on the Smt struct (as suggested in #375 (comment)) and only then update the definition of "tracked"? (or maybe that can happen naturally as a part of this refactoring)

I went ahead and decoupled PartialSmt from Smt in this PR and wired the new tracking semantics through a single helper, get_tracked_leaf().

I had to touch most of the PartialSmt internals anyway, so redefining the tracking logic fell out pretty naturally from the refactor. That said, I can split this into a separate PR if you’d prefer to review it in two steps.

A key is now considered tracked if either:

  • its path was explicitly added, or
  • we can prove the corresponding leaf is empty by walking up through inner nodes and comparing against EmptySubtreeRoots (no extra hashing).

@krushimir krushimir force-pushed the krushimir/partial-smt-tracking branch from 04a321f to f0c6b36 Compare December 10, 2025 19:32
Copy link
Copy Markdown
Contributor

@PhilippGackstatter PhilippGackstatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Very cool that we can check for tracked leaves so efficiently.

I think the main comments are to remove the tracked_leaves API, because I think it no longer is very meaningful and to check consistency during deserialization, though this is debatable.

Comment on lines +180 to +181
} else {
leaf.remove(key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This affects the behavior of tracked_leaves iterator, iiuc.

I think the behavior of that iterator so far was that it returned all leaves that were explicitly added, e.g. using add_path, even after a key-value pair was set to the empty word. This is because we had special logic to reinsert the leaf, even if it's value was set to the empty word. We had that logic, so that it would still be possible to update that key's value because of our previous definition of "tracked leaf".

Iiuc, we no longer need that logic now because we consider that leaf tracked anyway, right?

The other question is, is it fine if tracked_leaves does not return leaves that were explicitly added before? This could happen if they were set to the empty word, which this logic here would remove.

I think the larger question is what the use of tracked_leaves is, now that it no longer returns the full set of leaves that can be updated. Making it return all of them would be impractical, and so I think we should remove that API. I checked in base, client and node and none of them seem to use this API, so it seems fine to remove.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With implicit tracking, we no longer need the special reinsertion logic since leaves in empty subtrees are trackable regardless. Removed tracked_leaves as suggested.

Comment on lines +279 to +309
// - If the leaf is SmtLeaf::Empty, we will also insert it, which means this leaf is
// considered tracked by the partial SMT as it is part of the leaves map. When calling
// PartialSmt::insert, this will not error for such empty leaves whose merkle path was
// added, but will error for otherwise non-existent leaves whose paths were not added,
// which is what we want.
let prev_entries = self
.0
.leaves
.get(&current_index.value())
.map(|leaf| leaf.num_entries())
.unwrap_or(0);
let current_entries = leaf.num_entries();
self.0.leaves.insert(current_index.value(), leaf);
self.leaves.insert(current_index.value(), leaf);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to keep the comment in the second bullet point, or adjust it depending on how we move forward, because how we treat empty leaves is subtle and easy to miss.

Iiuc, we still insert empty leaves into the Leaves map, but we could now optimize and skip adding those due to the new tracked logic, so:

if !leaf.is_empty() {
    self.leaves.insert(current_index.value(), leaf);
}

But, I think we generally trade memory for performance, and so it probably makes sense to still keep that "caching"? Any thoughts on this?

In any case, I think it would be good to keep/add a comment here explaining the rationale for dealing with empty leaves.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - kept the current behavior and added a comment explaining the rationale.

The original comment was outdated after decoupling, so I adjusted it to reflect that empty leaves are now implicitly trackable but we cache them to avoid the lookup cost.

let num_entries = leaves.values().map(|leaf| leaf.num_entries()).sum();

Ok(PartialSmt(smt))
Ok(Self { root, num_entries, leaves, inner_nodes })
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally think we should check the inner consistency during deserialization. The code here assumes the input is trusted, since we don't check that leaves, inner nodes and root are consistent. I think this may be fine for the way we use it currently, but it would be very easy to start using in an RPC API context, where input is untrusted, without realizing that deserialization can mess up the invariants.

I think I would change this to assume untrusted input and validate the internal consistency. Maybe this could work by checking:

  • that the deserialized InnerNodes are consistent with the root
  • that each leaf's hash is consistent with its parent inner node's left/right child.

Maybe with the serialization refactor we could try to come up with a structure for supporting validating and non-validating deserialization, cc @bobbinth

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I think unless it creates significant complications, I would treat the inputs as untrusted (i.e., ideally, we'd validate the correctness of the deserialized data).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added validation during deserialization - we now verify that each inner node's hash matches what its parent expects, and each leaf's hash matches its parent inner node.

Comment on lines +967 to +974
const LEAF_0: u64 = 0;
const LEAF_1: u64 = 1 << 61;
const LEAF_2: u64 = 1 << 62;
const LEAF_3: u64 = (1 << 62) | (1 << 61);
const LEAF_4: u64 = 1 << 63;
const LEAF_5: u64 = (1 << 63) | (1 << 61);
const LEAF_6: u64 = (1 << 63) | (1 << 62);
const LEAF_7: u64 = (1 << 63) | (1 << 62) | (1 << 61);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we write // Key 0: CAN update (sibling of key 1, empty), but key0 and key1 are not actual siblings in the NodeIndex::sibling sense. So should the test setup here be such that leaf 0 and 1, 2 and 3, etc. are actual siblings, i.e. their LeafIndex::value is off by one? I think this would reflect the diagram more accurately, but then again, the diagram is inherently different from a depth 64 SMT so there's room for interpretation 😅.

I think both versions should effectively test the same thing, I just want to make sure the terminology in the comments is accurate 🙂

Comment on lines +1005 to +1015
// Key 4: CAN update (in empty subtree f)
let value_4: Word = rand_value();
full.insert(key_4, value_4).unwrap();
partial.insert(key_4, value_4).unwrap();
assert_eq!(full.root(), partial.root());

// Key 5: CAN update (in empty subtree f)
let value_5: Word = rand_value();
full.insert(key_5, value_5).unwrap();
partial.insert(key_5, value_5).unwrap();
assert_eq!(full.root(), partial.root());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For key 5, 6, 7 it may be worth noting somehow that after updating key 4, the subtree is no longer empty. Just to make sure the test is not misunderstood easily.

Comment on lines +1033 to +1036
// Key 3: CANNOT update (has data but no proof in partial SMT)
let result = partial.insert(key_3, rand_value());
assert_matches!(result, Err(MerkleError::UntrackedKey(_)));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add an assert_eq!(full.root(), partial.root()); here?

@krushimir
Copy link
Copy Markdown
Collaborator Author

Addressed all comments.

As Phillip mentioned, I had the split earlier but merged it back for easier review. Happy to split it again once you're both satisfied with the changes.

Copy link
Copy Markdown
Contributor

@PhilippGackstatter PhilippGackstatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the updates!

/// Checks that:
/// - Each inner node's hash is consistent with its parent.
/// - Each leaf's hash is consistent with its parent inner node's left/right child.
fn validate(&self) -> Result<(), DeserializationError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a small test covering an inner node hash mismatch and a leaf hash mismatch to make sure validation is working as expected?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@krushimir
Copy link
Copy Markdown
Collaborator Author

I’ve split the partial module into mod.rs and tests.rs again.
This is purely a file organization change with no logic modifications.

For reviewing the actual implementation changes, please view the diff excluding this last commit.

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you. I left some comments inline.

Copy link
Copy Markdown
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth bobbinth merged commit 73c2e61 into 0xMiden:next Dec 26, 2025
25 checks passed
@krushimir krushimir deleted the krushimir/partial-smt-tracking branch December 27, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change the definition of tracked in a PartialSmt

3 participants