Skip to content

C++ Component arrays (in Archetypes and log_components) need to be optional non-owning (& mappable) #3050

@Wumpf

Description

@Wumpf

Right now component arrays as they appear inside of archetypes as well as they are passed to log_components own data. This means that right now there is no zero-copy way from user data to serialized arrow blobs. In other words, in toy examples everything is fine because we push the data directly into rerun's archetype/component arrays, but for any more realistic use we have data that we want to ingest with as little conversion & copying as possible!

This ties naturally into our requirement to ingest data from commonly used libraries without an extra copy.

Towards this goal I propose instead of ingesting vectors/std::array/c-arrays a wrapper class, ComponentArray<TComponent>.
This wrapper can represent one of three things:

  • owned data as we have it right now, i.e. std::vector<TComponent>
    • created from rvalue reference constructors
  • a non-owned pointer & length to run of TComponent
    • created from const ref constructors
  • (slightly more future) a mapping function and reference to user data for types that support it
    • similar to how we specify python numpy array aliases in our component definition, we also supply aliases to certain component types
    • the first step is to support "informed reinterpret_cast", e.g. we know (via careful static assertion) that any float array with a length%3==0 can be reinterpreted as a list of Points!
    • more complex conversions should be possible in the future
      • anything that is not a reinterpret cast and requires reordering may force our serialization methods to do conversions on-the-fly or ahead of time, adding a bit of complexity there. On one hand we need to be careful that mapping functions can be inlined (pass functors to template methods!), on the other hand we should not revert back to exposing arrow serialization steps into our headers due to the ensuing dependency implications
        • we could draw the line at reordering/transposing, reducing the problem space a lot and allowing us to genericially formalize conversions without the need of taking user-defined mapping functions. Since we never want to do potentially destructive conversion (e.g. via floating point conversions) in the first place, this might be a good line to draw
    • note that this is important for simple, "weight bearing" component arrays like arrays of points - it should be trivial to just supply an array of floats and let them "reinterpret" as array of points
      • on the flip side we may never need to have this for components that are rare in frequency/small. We don't need to do zero-copy conversions of transforms as long as creating rerun::Transform3D remains easy

Naturally, we may get iterator-invalidation like problems from this, so we need to carefully document and make the right tradeoffs of what is implicit and explicit.

Our types in datatypes and components can stay the same - they are an accurate representation of how the data is ingested/interpreted upon serialization. But how we handle "views" must be much more flexible.

There's also a bunch to be learned from eigen's Map
https://eigen.tuxfamily.org/dox/group__TutorialMapClass.html

Before we too hastily decide on anything concrete here, we should evaluate a bunch of data ingestion usecases for large'ish data, e.g. from OpenMVG.

Metadata

Metadata

Assignees

Labels

sdk-cppC/C++ API specific

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions