Conversation
datastore/insert/batch/rects/insert
time: [180.84 µs 184.42 µs 188.96 µs]
thrpt: [52.920 Melem/s 54.225 Melem/s 55.296 Melem/s]
change:
time: [+1.0072% +2.1236% +3.3206%] (p = 0.00 < 0.05)
thrpt: [-3.2139% -2.0795% -0.9972%]
| mod arrow; | ||
|
|
||
| #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
| #[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash, arrow2_convert::ArrowField)] |
There was a problem hiding this comment.
Did something change that caused the derive macro to work in this context now but not previously?
There was a problem hiding this comment.
Not that I know of; @jondo2010 would know better than I though
jleibs
left a comment
There was a problem hiding this comment.
If the intent is for this to be similar to the cluster_key in terms of existence-guarantees, seems like we should have something in re_arrow_store that checks for it and complains loudly if it's missing on insert.
Hmmm, interesting observation... Personally I'd say this is not quite on the same level as cluster keys. The lack of Now, whether others agree with me... 😶 |
| .iter() | ||
| .map(|(component, bundle)| { | ||
| let instances = if bundle.components.len() == 1 { | ||
| let instances = if bundle.components.len() < 3 { |
There was a problem hiding this comment.
The relationship between the 3 here and the knowledge that MsgBundle populates a MsgId component is a bit too "magic" for my taste. Probably worth adding a comment, but in general I wonder if this check would be better done by adding a helper to MsgBundle like: has_component(&ComponentName)
There was a problem hiding this comment.
Yeah this definitely got outta hands 😄 I'll make another PR, there's a couple of helpers I'd like to see on MsgBundle actually
This PR elevates
MsgIdto the rank ofComponent, and makes sure that it always gets appended to every row of every message bundle!This is required so that we can keep track of per-message data on the backend, e.g. the complete list of timepoints associated with a given row (the store doesn't have this information, since the different timelines of a timepoint live in completely separate indices!).
Note that this is not granular enough: we need an ID for each row (i.e. each event) in addition to an ID for each message!
As of today, though, a message is always 1 row, so it'll do for now.
This has to be done on the client-side / as early as possible! Trying to craft arrow payloads directly from within the datastore in the hot write path would tank the write performance down to pretty much zero.
Another nice side-effect of this is that it gets us closer to the flat-table ingestion format we've been moving towards.
Fixes #541
Unblocks #654
The benchmarks are the same before, except for the
insertone of course since they literally insert 50% more components now ([Instance, Rect2D] =>[Instance, Rect2D, MsgId]):