C++ Component arrays (in Archetypes and `log_components`) need to be optional non-owning (& mappable)

Right now component arrays as they appear inside of archetypes as well as they are passed to `log_components` own data. This means that right now there is no zero-copy way from user data to serialized arrow blobs. In other words, in toy examples everything is fine because we push the data directly into rerun's archetype/component arrays, but for any more realistic use we have data that we want to ingest with as little conversion & copying as possible!

This ties naturally into our requirement to ingest data from commonly used libraries without an extra copy.

Towards this goal I propose instead of ingesting vectors/std::array/c-arrays a wrapper class, `ComponentArray<TComponent>`.
This wrapper can represent one of three things:
* owned data as we have it right now, i.e. `std::vector<TComponent>`
   * created from rvalue reference constructors
* a non-owned pointer & length to run of `TComponent`
   * created from const ref constructors
* (slightly more future) a mapping function and reference to user data for types that support it
   * similar to how we specify python numpy array aliases in our component definition, we also supply aliases to certain component types
   * the first step is to support "informed `reinterpret_cast`", e.g. we know (via careful static assertion) that any float array with a length%3==0 can be reinterpreted as a list of Points!
   * more complex conversions should be possible in the future
       * anything that is not a reinterpret cast and requires reordering **may** force our serialization methods to do conversions on-the-fly or ahead of time, adding a bit of complexity there. On one hand we need to be careful that mapping functions can be inlined (pass functors to template methods!), on the other hand we should not revert back to exposing arrow serialization steps into our headers due to the ensuing dependency implications
            * we could draw the line at reordering/transposing, reducing the problem space a lot and allowing us to genericially formalize conversions without the need of taking user-defined mapping functions. Since we never want to do potentially destructive conversion (e.g. via floating point conversions) in the first place, this might be a good line to draw
   * note that this is important for simple, "weight bearing" component arrays like arrays of points - it should be trivial to just supply an array of floats and let them "reinterpret" as array of points
       * on the flip side we may never need to have this for components that are rare in frequency/small. We don't need to do zero-copy conversions of transforms as long as creating `rerun::Transform3D` remains easy


Naturally, we may get iterator-invalidation like problems from this, so we need to carefully document and make the right tradeoffs of what is implicit and explicit.

Our types in `datatypes` and `components` can stay the same - they are an accurate representation of how the data is ingested/interpreted upon serialization. But how we handle "views" must be much more flexible.

There's also a bunch to be learned from eigen's Map
https://eigen.tuxfamily.org/dox/group__TutorialMapClass.html


Before we too hastily decide on anything concrete here, we should evaluate a bunch of data ingestion usecases for large'ish data, e.g. from OpenMVG.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ Component arrays (in Archetypes and `log_components`) need to be optional non-owning (& mappable) #3050

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++ Component arrays (in Archetypes and log_components) need to be optional non-owning (& mappable) #3050

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

C++ Component arrays (in Archetypes and `log_components`) need to be optional non-owning (& mappable) #3050