Related to:
We generate archetypes and components for all tensor variants (TensorF32, TensorU8, etc) and make sure they share the same Visualizer:
archetype TensorU8 {
buffer: BufferU8,
// One of these
shape: TensorShape,
shape: Vec<TensorDimension>,
}
component BufferU8 {
data: [u8],
}
archetype TensorF32 {
buffer: BufferF32,
// One of these
shape: TensorShape,
shape: Vec<TensorDimension>,
}
component BufferF32 {
data: [f32],
}
- mechanics of same-visualizer are a bit unclear. Have visualizer just listen to several indicators / archetypes? Breaks 1:1 relationship that we were striving for. Can revisit later?
- this will break some "use this tensor like an image" cases that we allow today. Mitigate only as far as meaningful
Impact on Mesh's texture:
Log an Image archetype at the same spot instead.
Detailed rationale (via @jleibs on #6388 (comment)):
Most of the choices for working with tensors fall into one of 4 categories.
Typed buffer, multiple data-types (the proposal)
Pros:
- When processing a chunk the raw arrow data is much easier to work with
- Opportunity to align with the official arrow spec for tensor representation
- Aligns with our long-term direction of wanting to have multiple types and datatype conversions
Cons:
- Multi-datatype representation means we must either proliferate typed components or introduce datatype conversions.
The current hypothesis is that proliferating types is a known challenge and can be mostly automated with a mixture of code-gen and some helper code, whereas datatype conversions is an unknown challenge.
Still this puts us on a pathway where once we support multi-typed components, we mostly delete a bunch of code and everything gets simpler. Any type conversions move from visualizer-space to data-query-space, but the types and arrow representations we work with don't actually need to change.
Untyped buffer with type-id
Pros
- Avoids arrow unions while maintaining a single datatype.
Cons
- Forces arrow users to do annoying user-space datatype casting.
- Doesn't align with our long-term goals
Typed buffer with union
Pros
- Status quo. Already works.
Cons
- Forces arrow users to do annoying poorly supported union operations when loading or reading tensors.
Related to:
We generate archetypes and components for all tensor variants (TensorF32, TensorU8, etc) and make sure they share the same Visualizer:
Impact on Mesh's texture:
Log an Image archetype at the same spot instead.
Detailed rationale (via @jleibs on #6388 (comment)):
Most of the choices for working with tensors fall into one of 4 categories.
Typed buffer, multiple data-types (the proposal)
Pros:
Cons:
The current hypothesis is that proliferating types is a known challenge and can be mostly automated with a mixture of code-gen and some helper code, whereas datatype conversions is an unknown challenge.
Still this puts us on a pathway where once we support multi-typed components, we mostly delete a bunch of code and everything gets simpler. Any type conversions move from visualizer-space to data-query-space, but the types and arrow representations we work with don't actually need to change.
Untyped buffer with type-id
Pros
Cons
Typed buffer with union
Pros
Cons