GC the blueprints before saving while preserving the current state#3148
Merged
GC the blueprints before saving while preserving the current state#3148
Conversation
…nating the profile
6eead83 to
314bc83
Compare
7c9ae50 to
2c4a3ec
Compare
emilk
approved these changes
Aug 30, 2023
Member
emilk
left a comment
There was a problem hiding this comment.
It looks really good, but there was one piece of logic I failed to follow.
There is also, I believe, a chance to reduce code duplication by recognizing that "Everything" can be expressed numerically
| DropAtLeastFraction(f64), | ||
|
|
||
| /// GC Everything that isn't protected | ||
| Everything, |
Member
There was a problem hiding this comment.
How is this different from DropAtLeastFraction(1.0) ?
Contributor
Author
There was a problem hiding this comment.
Two differences:
- DropAtLeastFraction does size book-keeping and potentially decides to stop early. I was running into edge cases where it would conclude it had met the 1.0 threshold, but in fact still had stuff it could GC.
- DropAtLeastFraction needs to do an oldest-first incremental traversal. Everything has lots more room for optimizations that require less shuffling and re-sorting.
Special-casing the numerical value of 1.0 felt weirder to me than special-casing on an explicit enum but I can make that switch if you think it's clearer.
jleibs
added a commit
that referenced
this pull request
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.  * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this pull request
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.  * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this pull request
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.  * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this pull request
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.  * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
jleibs
added a commit
that referenced
this pull request
Aug 31, 2023
…3148) Resolves: #3098 Related to: #1803 Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints. This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet. This PR: - Introduces a new `GarbageCollectionOptions` instead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve. - Introduces a new gc target: Everything. - Calculates a set of protected rows for every component based on the last relevant row across every timeline (including timeless). - Modifies both `gc_drop_at_least_num_bytes` and the new `gc_everything` to respect the protected rows during gc. - Modifies the store_hub to gc the blueprint before saving it. Photogrammetry with `--no-frames` is another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.  * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/3148) (if applicable) - [PR Build Summary](https://build.rerun.io/pr/3148) - [Docs preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/docs) - [Examples preview](https://rerun.io/preview/60f3747383780c50886ac781bdf81b32fbff76bd/examples) - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Resolves: #3098
Related to: #1803
Because blueprints used timeless data and timeless data wasn't GC'd, we previously had no great way to clean up blueprints.
This PR paves the way for better overall GC behavior in the future but doesn't change the default behavior yet.
This PR:
GarbageCollectionOptionsinstead of just providing a target. This allows you to configure whether you want to gc the timeless data, and additionally how many latest_at values you want to preserve.gc_drop_at_least_num_bytesand the newgc_everythingto respect the protected rows during gc.Photogrammetry with

--no-framesis another "worst-case" for blueprint because every image is a space-view, so you can easily create a huge blueprint history by repeatedly resetting the blueprint.Checklist