Skip to content

Logging lots of out-of-order data using the same timepoint on a timeline leads to extreme slowdown in datastore #4415

@emilk

Description

@emilk

A user shared a puffin profile of very slow data ingestion

slow_ingestion.puffin.zip

image

image

What I see is this: the data ingestion (which is currently happening in the main rendering loop, see #4298) gets a message containing ~3800 rows of data. Processing that takes ~2 seconds, because each insertion takes a whopping ~9ms. The bulk of the insertion cost is re-sorting, which makes me suspect that the reason for this new slowdown is out-of-order insertions.

There are several layers of bad here:

  • As mentioned, we shouldn't be doing insertions on the main render loop (Parallelize data ingestion #4298)
  • If we do, we should make sure to not use more than a certain time budget
  • Each insertion should not be this slow, even with out-of-order insertion

Workarounds

  • Run the logging with RERUN_FLUSH_NUM_ROWS=1: this will create smaller chunks of work for the viewer, which should get back the viewer frame-rate, but still with slow ingestion

  • Stop logging data out-of-order. This could happen if you are using custom timelines that are not monotonically increasing. Rerun should support out-of-order logging, but it seems like we're hitting a bad corner case here.

Other notes

It would be great if we could tell from a puffin profile whether or not the code that was running was a debug or release build.
If this was from a debug build, then this whole issue is a mistake.

Metadata

Metadata

Assignees

Labels

⛃ re_datastoreaffects the datastore itself📉 performanceOptimization, memory use, etc🪳 bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions