Skip to content

feat: add adr-001 for node key refactoring#608

Merged
tac0turtle merged 29 commits intomasterfrom
592/adr
Feb 21, 2023
Merged

feat: add adr-001 for node key refactoring#608
tac0turtle merged 29 commits intomasterfrom
592/adr

Conversation

@cool-develope
Copy link
Contributor

ref: #592, #593, #597

@cool-develope cool-develope requested a review from a team November 1, 2022 11:08
@tac0turtle
Copy link
Contributor

a good question by celestia, how does this effect the commitment structure? Are proofs still the same?

@cool-develope
Copy link
Contributor Author

a good question by celestia, how does this effect the commitment structure? Are proofs still the same?

it doesn't affect, as you can see in #597 it doesn't update any proof part and still passing tests

@tzdybal
Copy link

tzdybal commented Nov 1, 2022

a good question by celestia, how does this effect the commitment structure? Are proofs still the same?

it doesn't affect, as you can see in #597 it doesn't update any proof part and still passing tests

#597 still contains version field, which is still hashed (see:

iavl/node.go

Line 364 in 36a150e

err = encoding.EncodeVarint(w, node.version)
). Removing version will affect all the hashes, and therefore migration will invalidate all existing proofs.

@cool-develope
Copy link
Contributor Author

cool-develope commented Nov 1, 2022

Removing version will affect all the hashes, and therefore migration will invalidate all existing proofs.

@tzdybal , good catch. @tac0turtle how about your thougt? Will we keep the version or refactor the ics23 part?

@tac0turtle
Copy link
Contributor

@tzdybal , good catch. @tac0turtle how about your thougt? Will we keep the version or refactor the ics23 part?

what do you mean by refactor ics23?

So there are two paths, introduce this in a breaking release of the sdk which a coordinated upgraded is needed or leave it alone. for testing purposes and data collection we probably want to leave it but in the future we could break it. BUT we should check with IBC about this.

@cool-develope
Copy link
Contributor Author

what do you mean by refactor ics23?

I mean update the validation part in ics23.

So there are two paths, introduce this in a breaking release of the sdk which a coordinated upgraded is needed or leave it alone. for testing purposes and data collection we probably want to leave it but in the future we could break it. BUT we should check with IBC about this.

Sounds good, then let's keep the version at this time. I will create the issue and we can revisit in the future

Comment on lines +34 to +35
leftNodeKey int64 // new field, need to store in the storage
rightNodeKey int64 // new field, need to store in the storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these fields, if we already have leftNode and rightNode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftNode and rightNode is only meaningful on memory side, we need these fields to get children from the storage side.

cool-develope and others added 3 commits November 1, 2022 11:44
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Aleksandr Bezobchuk <alexanderbez@users.noreply.github.com>
Copy link
Member

@aaronc aaronc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear as to how an integer improves data locality. Can you explain this a bit more? It would seem that it puts nodes that are created at similar times near each other in the B-tree, but this doesn't actually improve things like range scans other than by reducing the size of the node key which is significant.

@cool-develope
Copy link
Contributor Author

cool-develope commented Nov 1, 2022

I'm unclear as to how an integer improves data locality. Can you explain this a bit more? It would seem that it puts nodes that are created at similar times near each other in the B-tree, but this doesn't actually improve things like range scans other than by reducing the size of the node key which is significant.

Reducing key size itself improves data locality. I am not sure if leveldb uses compressed trie or general bTree, but we can reduce the node count and improve the tree density.

@yihuang
Copy link

yihuang commented Dec 12, 2022

A different nonce assignment strategy can be useful in for example migration, we can do a sequential iteration on existing db, should be much faster than doing random access to find the nodes version by version

The migration would be complicated, it requires a more detail design. I don't think we can keep the current import/export structure in the new design.

https://github.com/cosmos/iavl/blob/master/export.go#L21
The current export node is already nonce agnostic, it just rely on a post-order traversal of the tree, I think we can keep that, so the nonce assignment strategy can be a node local decision.

The `Update` operation will require extra DB access because we need to take children to calculate the hash of updated nodes.
It doesn't require more access in other cases including `Set`, `Remove`, and `Proof`.

It is impossible to remove the individual version. The new design requires more restrict pruning strategies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is enough, it will remove orphans and it will be impossible to remove the intermediate version from storage.
And pruning part explains in more detail the new methods.

Copy link

@yihuang yihuang Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if removing orphans, it's certainly possible to delete the individual versions in between.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets edit to include this.

@yihuang
Copy link

yihuang commented Jan 4, 2023

Another round of review:

Keep order of insertion sorted

when there are multiple stores in the same db, the insertion is not sorted by keys anymore.

Migration

You also need to record the mapping of hash -> new node key to update the references in node body.

Pruning

We'll need both in-order and pre-order somehow, in-order to keep the key order, pre-order to skip subtree early on.

Remove the individual version

It's still possible to remove individual version with new pruning alrogithm, we just need to be aware of predecessor version and avoid deleting nodes older than that.

Export Snapshot

Can we clarify that the new node key format don't affect the snapshot export format? so the nonce assignment strategy is not part of consensus, but a node local decision.


### Negative

The `Update` operation will require extra DB access because we need to take children to calculate the hash of updated nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to say that it requires extra DB reads? Why is this not needed in Set and Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because update only needs the re-calc of hash, but Set and Remove requires calcHeightAndSize (re-calculation of height and size) and this needs to splay the children nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are getting left and right in Set and Remove but not splay in Update since we have leftHash and rightHash in the original version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the trade off here is storage saving on per node basis, right? maybe we mention this as a tradeoff

@tac0turtle tac0turtle requested a review from yihuang February 21, 2023 14:09
@tac0turtle tac0turtle merged commit c61d1de into master Feb 21, 2023
@tac0turtle tac0turtle deleted the 592/adr branch February 21, 2023 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

9 participants