Skip to content

Move code-generated enum from union to int type. #7022

@jleibs

Description

@jleibs

As part of:

We should also switch our code-gen over to generating simple int based types with per-language generated constants / utilities for mapping to native unions.

The enticing aspect of using Unions for enums was the fact that the schema definition includes named fields for the unions, giving us a degree of semantic name mapping to the type-ids.

However, practically working with unions, even just for constants like this is too painful to justify. Nesting them within structs doesn't work. Giant schema-definitions are annoying. Constructing them without convenience helpers, even when you already have the schema is still painful.

Contrast:

class PixelFormat(Enum):
    NV12 = 1
    YUY2 = 2


pixel_format_schema = pa.sparse_union([
    pa.field("_null_markers", pa.null(), nullable=True),
    pa.field("NV12", pa.null(), nullable=True),
    pa.field("YUY2", pa.null(), nullable=True),
])

pixel_format = pa.UnionArray.from_buffers(
    type=pixel_format_schema,
    length=1,
    buffers=[
        None,
        pa.array([int(PixelFormat.NV12.value)], type=pa.int8()).buffers()[1],
    ],
    children=(1 + 2) * [pa.nulls(1)],
)

with

class PixelFormat(Enum):
    NV12 = 1
    YUY2 = 2

pixel_format_schema2 = pa.uint16()

pixel_format2 = pa.array([PixelFormat.NV12.value], type=pixel_format_schema2)

In contexts like datafusion we can include registered function helpers to optionally decode unions to strings. In helper libraries that provide schema definitions for convenience we can provide appropriate constants.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions