Skip to content

Time panel chunkification#6934

Merged
jprochazk merged 48 commits intomainfrom
jan/time-panel-chunkification
Jul 23, 2024
Merged

Time panel chunkification#6934
jprochazk merged 48 commits intomainfrom
jan/time-panel-chunkification

Conversation

@jprochazk
Copy link
Copy Markdown
Member

@jprochazk jprochazk commented Jul 18, 2024

What

This PR removes all usage of the recursive time histograms tracked in EntityTree. The entity tree now only represents the... tree of entities, and no other information about them. All required information is queried from the database on demand.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@jprochazk jprochazk changed the title Finish time panel chunkification Time panel chunkification Jul 18, 2024
@jprochazk
Copy link
Copy Markdown
Member Author

entity and subtree have been erased from existence, and everything seems to still be working. I just want to confirm that the memory usage has indeed gone down.

@jprochazk
Copy link
Copy Markdown
Member Author

It... has not? It seems to be the exact same as main, which is weird

@jprochazk
Copy link
Copy Markdown
Member Author

jprochazk commented Jul 19, 2024

Okay, according to others, the memory usage is not supposed to change all that much. What should improve is ingestion speed.

@teh-cmc
Copy link
Copy Markdown
Contributor

teh-cmc commented Jul 19, 2024

Okay, according to others, the memory usage is not supposed to change all that much. What should improve is ingestion speed.

There are many many things that slow down ingestion right now on main, so to be more specific: what should be faster is this part of the frame (adding chunks to the store and running subscribers).
image

@teh-cmc
Copy link
Copy Markdown
Contributor

teh-cmc commented Jul 19, 2024

I've just tried it and I can already tell you right now that your branch is a good order of magnitude faster :D Nice!

@jprochazk
Copy link
Copy Markdown
Member Author

jprochazk commented Jul 22, 2024

In terms of testing, as I was working on this I would manually add logs to the places with code changes just to make sure that those code paths are actually being hit and are still doing the right thing. The logs would print both the old and new values, so I could compare them.

I think there should be no observable changes between this branch and main, maybe only in the byte size of the entity tree in the time panel, so it's hard to say exactly what should be tested, but a good start would be opening up various examples and checking them against the latest web viewer build from main: https://rerun.io/viewer/commit/ddf60c4

Copy link
Copy Markdown
Contributor

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great.

Now we have to make our semantics consistent across the board.

query_caches.on_events(store_events);

tree.on_store_deletions(store_events);
// Tree deletions depend on data store, so data store must have been notified of deletions already.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this comment -- the data store is the thing that emitted the changelog in the first place, how could it not know about what's in it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the problem here is that the store events are not actually being used by the tree anymore, so there's no direct dependency between them, which is inconsistent with the rest of the function, where everything else uses the store events. Just wanted to make that as clear as possible

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can just add the param back and then not use it

Comment on lines +124 to +127
/// Returns `false` if this entity has no children and no data.
pub fn is_empty(&self, chunk_store: &ChunkStore) -> bool {
self.children.is_empty()
&& !chunk_store.entity_has_any_component_on_any_timeline(&self.path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I'm starting to see where the ambiguities stem from.

What do you really want to know here: whether the entity has some component associated with it (which, once true, will always be true -- no matter of what the GC ends up removing), or whether the entity has some events available in the store (which can be true at some point but then false at a later time because GC)?

This method should make it clear which one of these is at play here.

We need to make that clear all over the place in all these store APIs and their docs, in fact.

Copy link
Copy Markdown
Member Author

@jprochazk jprochazk Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I look at it from the perspective of the time panel, it should not display any entities which do not actually have any visible data (either static or temporal). The way I understand it, that's how it worked previously:

https://github.com/rerun-io/rerun/pull/6934/files#diff-34bc602a4376f90828813d30cf2e1ae5f940f9db3d3b51196c383b2af2cdca14L335

The component was removed from that entity in the entity tree if the histogram was empty, which means the time panel would no longer show that component, and if it had no components and no children left, it would be removed completely.

I'll properly name all the APIs with data or component depending on what they're looking for, and make the distinction clear in their docs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha-ha! Now I see why you need those.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just one specific place where we give the user some more information about what they're doing wrong

@jprochazk jprochazk requested a review from teh-cmc July 22, 2024 16:39
Copy link
Copy Markdown
Contributor

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic

:shipit:

jprochazk and others added 6 commits July 23, 2024 09:54
Co-authored-by: Clement Rey <cr.rey.clement@gmail.com>
Co-authored-by: Clement Rey <cr.rey.clement@gmail.com>
Co-authored-by: Clement Rey <cr.rey.clement@gmail.com>
Co-authored-by: Clement Rey <cr.rey.clement@gmail.com>
@jprochazk jprochazk merged commit f7d9e1c into main Jul 23, 2024
@jprochazk jprochazk deleted the jan/time-panel-chunkification branch July 23, 2024 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize TimeHistogram

3 participants