Skip to content

Split Tensor component into several archetypes #6832

@Wumpf

Description

@Wumpf

Related to:

We generate archetypes and components for all tensor variants (TensorF32, TensorU8, etc) and make sure they share the same Visualizer:

archetype TensorU8 {
    buffer: BufferU8,
    
    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferU8 {
    data: [u8],
}

archetype TensorF32 {
    buffer: BufferF32,

    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferF32 {
    data: [f32],
}
  • mechanics of same-visualizer are a bit unclear. Have visualizer just listen to several indicators / archetypes? Breaks 1:1 relationship that we were striving for. Can revisit later?
  • this will break some "use this tensor like an image" cases that we allow today. Mitigate only as far as meaningful

Impact on Mesh's texture:
Log an Image archetype at the same spot instead.

Detailed rationale (via @jleibs on #6388 (comment)):

Most of the choices for working with tensors fall into one of 4 categories.

Typed buffer, multiple data-types (the proposal)

Pros:

  • When processing a chunk the raw arrow data is much easier to work with
  • Opportunity to align with the official arrow spec for tensor representation
  • Aligns with our long-term direction of wanting to have multiple types and datatype conversions

Cons:

  • Multi-datatype representation means we must either proliferate typed components or introduce datatype conversions.

The current hypothesis is that proliferating types is a known challenge and can be mostly automated with a mixture of code-gen and some helper code, whereas datatype conversions is an unknown challenge.

Still this puts us on a pathway where once we support multi-typed components, we mostly delete a bunch of code and everything gets simpler. Any type conversions move from visualizer-space to data-query-space, but the types and arrow representations we work with don't actually need to change.

Untyped buffer with type-id

Pros

  • Avoids arrow unions while maintaining a single datatype.

Cons

  • Forces arrow users to do annoying user-space datatype casting.
  • Doesn't align with our long-term goals

Typed buffer with union

Pros

  • Status quo. Already works.

Cons

  • Forces arrow users to do annoying poorly supported union operations when loading or reading tensors.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions