As part of:
We should also switch our code-gen over to generating simple int based types with per-language generated constants / utilities for mapping to native unions.
The enticing aspect of using Unions for enums was the fact that the schema definition includes named fields for the unions, giving us a degree of semantic name mapping to the type-ids.
However, practically working with unions, even just for constants like this is too painful to justify. Nesting them within structs doesn't work. Giant schema-definitions are annoying. Constructing them without convenience helpers, even when you already have the schema is still painful.
Contrast:
class PixelFormat(Enum):
NV12 = 1
YUY2 = 2
pixel_format_schema = pa.sparse_union([
pa.field("_null_markers", pa.null(), nullable=True),
pa.field("NV12", pa.null(), nullable=True),
pa.field("YUY2", pa.null(), nullable=True),
])
pixel_format = pa.UnionArray.from_buffers(
type=pixel_format_schema,
length=1,
buffers=[
None,
pa.array([int(PixelFormat.NV12.value)], type=pa.int8()).buffers()[1],
],
children=(1 + 2) * [pa.nulls(1)],
)
with
class PixelFormat(Enum):
NV12 = 1
YUY2 = 2
pixel_format_schema2 = pa.uint16()
pixel_format2 = pa.array([PixelFormat.NV12.value], type=pixel_format_schema2)
In contexts like datafusion we can include registered function helpers to optionally decode unions to strings. In helper libraries that provide schema definitions for convenience we can provide appropriate constants.
As part of:
We should also switch our code-gen over to generating simple
intbased types with per-language generated constants / utilities for mapping to native unions.The enticing aspect of using Unions for enums was the fact that the schema definition includes named fields for the unions, giving us a degree of semantic name mapping to the type-ids.
However, practically working with unions, even just for constants like this is too painful to justify. Nesting them within structs doesn't work. Giant schema-definitions are annoying. Constructing them without convenience helpers, even when you already have the schema is still painful.
Contrast:
with
In contexts like datafusion we can include registered function helpers to optionally decode unions to strings. In helper libraries that provide schema definitions for convenience we can provide appropriate constants.