chore(Node): Do not keep the entire genesis file in memory during node operation#4247
chore(Node): Do not keep the entire genesis file in memory during node operation#4247
Conversation
…is (#4235) Closes #4234. ### Changes The [`Environment`](https://github.com/cometbft/cometbft/blob/8273634cdefb2fb22765a14f9acb9fca608e9982/rpc/core/env.go#L72) type will: - Store a pointer to a `GenesisDoc` if the genesis file size is below a threshold currently set to 16 MB. - If the genesis file exceeds this threshold, it will split it into 16 MB chunks and store the hashes of each chunk in memory in a slice. In this case, the pointer to the `GenesisDoc` will be `nil`. Because the `InitGenesisChunks` method of `Environment` is responsible for creating the chunks, most of the changes of this PR focus on refactoring this method and adding unit tests for it. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments
…alization (#4250) Closes #4248. ### Changes The `Node` type will: - create a singleton instance of `Environment` when setting up the RPC APIs. - set the pointer to the `GenesisDoc` to `nil` once startup is complete so that the GC can collect the in-memory `GenesisDoc` object. - store a `GenesisTime` field. --- #### PR checklist - [x] Tests written/updated - [ ] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments
It was missing. Interestingly, tests did not pick up on this.
It seems that the `base64.StdEncoding.EncodeToString` function does not make a full copy of its argument. Therefore creating each `chunk` as a direct sub-slice of `data` was preventing the GC from deallocating the `data` slice after the function returned (you could still see it as allocated in pprof graphs). Creating each `chunk` as a copy of the sub-slice of `data` solves the issue.
d28d4d6 to
6bfe1e1
Compare
|
See here for a visualization of the changes in memory usage after applying these changes. |
|
Note: despite our efforts, these changes break the
|
andynog
left a comment
There was a problem hiding this comment.
Hi @alesforz, I think the implementation looks good, I've added comments and suggestions. I've also tested the API using a sample file and seems to work.
I didn't not validate the content returned. Just realized that we should provide maybe a CLI command (or sample app) in the future that could download the chunks and re-assemble into a genesis doc. Expecting users and operators to download encoded chunks might not be the best UX
- Added a log message in `ConfigureRPC()` to announce the start of the creation of the genesis file chunks - Improved error message returned by `InitGenesisChunks()` - `TestInitGenesisChunks() now checks that the original genesis doc and the genesis doc reassembled from its chuks match.
andynog
left a comment
There was a problem hiding this comment.
thanks for addressing the comments and suggestions. Looks good to me.
|
Are the genesis json files still needed for testing ?As you added strings, can they be removed from the PR ? |
@jmalicevic The files are still there to run the unit tests. They are fictitious though: I can substitute the small genesis file with a string in the code, but the big genesis file needs to be there to unit test the chunking mechanism. I can reduce its size (it's about 42MB) if that's your concern. |
|
Have not reviewed the PR, but am looking forward to this feature! |
.changelog/unreleased/improvements/4235-4234-environment-stores-one-genesis.md
Outdated
Show resolved
Hide resolved
@jmalicevic Updated the testing code to generate the big genesis docs at runtime rather than having a file in the repo 😄 See last couple of commits |
jmalicevic
left a comment
There was a problem hiding this comment.
Thanks for this @alesforz !
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
…e operation (cometbft#4247) Closes cometbft#1290. This PR tracks the work we are doing for cometbft#1290. For details, please refer to cometbft#1290 discussion. ### ToDo - [x] Environment stores either a pointer to a GenesisDoc or the genesis' chunks, but not both. Merged to this branch with cometbft#4235. - [x] [Node](https://github.com/cometbft/cometbft/blob/87a8c8948bf9926aff37f344b4120163f12cb4ad/node/node.go#L50) does not keep a pointer to a GenesisDoc after initialization. Merged to this branch with cometbft#4250. --- #### PR checklist - [x] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Closes #1290.
This PR tracks the work we are doing for #1290.
For details, please refer to #1290 discussion.
ToDo
Environmentstores only one in-memory copy of the Genesis #4235.PR checklist
.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments