-
Notifications
You must be signed in to change notification settings - Fork 780
Closed
Labels
P:storage-optimizationPriority: Give operators greater control over storage and storage optimizationPriority: Give operators greater control over storage and storage optimization
Milestone
Description
This is a follow-up issue from the PR #1017 submitted and merged
During the PR review, there were a few points highlighted to address some concerns the CometBFT team had regarding this particular PR. Since this PR was merged from a fork, a few of the suggestions by the CometBFT team to address the concerns were not possible to be accepted directly on the PR. So we merged the PR #1017 and will address these concerns in this issue.
Concerns to be addressed:
- The logic in PR 1017 uses the marshalled JSON bytes as the input for the hashing function. As discussed in the PR, this approach might cause some non-deterministic behaviours or it might be very error prone. JSON is not a canonical data representation, so any semantically insignificant variation, in the genesis file might result in a hash mismatch.
- The
AppStatepart of the genesis is application dependent and CometBFT has no "understanding" on its internal data structures. In theGenesisDoctype, theAppStateproperty is just ajson.RawMessageso any variations in its content might cause side effects that are beyond CometBFT's control such as hash mismatches. - In the current logic on PR 1017, if the node crashes before saving the genesis hash into storage some undesired side effects can happen. As per comment, the logic on this line should be changed by applying the suggestion in the comment.
Context on genesis file chain initialization and upgrades:
- Historically the way chain operators (node operators and validators) validate the
genesisfile is to compare a checksum obtained from a reference repository or a group of operators during a coordinated upgrade or new chain initialization. For example, during a chain upgrade, validators might export the current state via the CLI (e.g.gaiad export) and then they can generate a checksum of the exported genesis (based on a agreed hashing function (e.g. `sha256sum) and then they coordinate the upgrade by validating their own checksum against the "agreed" checksum of the majority of operators to ensure they all match in order to start the chain and achieve consensus. Therefore, implementing a logic that leverages the same hashing functionality on the same input used by operators is critical to ensure a reliable and trusted solution. - Currently, as far as we know, there's no central repository or on-chain stored information about these genesis checksum . Historically, most chains maintain a
mainnetrepository somewhere in Github where they list the checksum on a page for each genesis upgrade, or the checksum might be part of the release information for a chain mainnet or upgrade in Github.
Suggested changes to be made to improve the solution:
- The main concern to be addressed is around the hashing functionality as outlined in the sections
#1and#2above. In our opinion, relying on a hash of the marshalled JSON input (bytes) is not the best solution (as mentioned in the sections above). We believe that a better approach is to rely on the checksum of thegenesis.jsonfile content. By applying this logic, we use the same approach chain operators use to manually validate a thegenesisfile (perform a checksum of the file contents, e.g.$ sha256sum genesis.json). We believe operators will have more confidence in the solution since they can also have an external way to validate the hash information. - [OPTIONAL] Add a way for operators to specify the hash (checksum) value when starting (or initializing) a node. It could be a CLI command parameter or allow them to add the value in the configuration file (e.g.
genesis_hash = "b2fca9af31aae6953124439df5bc300c1ef11fb8be938aeb570767c5b6bfdaaf"that can be used to validate the hash computed by CometBFT when reading the genesis file. If the hashes don't match (computed = specified) then the node fails to start, if they match then store the hash in the database. This will ensure that they are confident on the hash that will be stored in the database. As part of the future planning, we can also return the hash information via RPC and this will make it possible for operators and other users to validate the hash information against peers or public RPC nodes and that would address point#6above
### Tasks
- [ ] Create genesis hash using `sha`. Make sure it is properly saved into the DB.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P:storage-optimizationPriority: Give operators greater control over storage and storage optimizationPriority: Give operators greater control over storage and storage optimization