Skip to content

ak.forms.form.index_to_dtype is probably wrong: should probably be native, not little-endian #3356

@jpivarski

Description

@jpivarski

Version of Awkward Array

HEAD

Description and code to reproduce

Compare

_primitive_to_dtype_dict = {
"bool": np.dtype(np.bool_),
"int8": np.dtype(np.int8),
"uint8": np.dtype(np.uint8),
"int16": np.dtype(np.int16),
"uint16": np.dtype(np.uint16),
"int32": np.dtype(np.int32),
"uint32": np.dtype(np.uint32),
"int64": np.dtype(np.int64),
"uint64": np.dtype(np.uint64),
"float32": np.dtype(np.float32),
"float64": np.dtype(np.float64),
"complex64": np.dtype(np.complex64),
"complex128": np.dtype(np.complex128),
"datetime64": np.dtype(np.datetime64),
"timedelta64": np.dtype(np.timedelta64),
}

which sets the dtype for each primitive category to the native-endianness for the machine (np.dtype(np.float64) will be big-endian on big-endian machines and little-endian on little-endian machines) with

index_to_dtype: Final[dict[str, DType]] = {
"i8": np.dtype("<i1"),
"u8": np.dtype("<u1"),
"i32": np.dtype("<i4"),
"u32": np.dtype("<u4"),
"i64": np.dtype("<i8"),
}

which sets the dtype for each index category to little-endian. This should probably be native-endian, too. I'm 90% sure of it.

The problem is that we'd like to test it before we change it. Is there any way we can test Awkward on a big-endian machine? Maybe in a qemu emulation? (Such a test would probably reveal a lot.)

On the flip side, the fact that it's so hard to find a big-endian machine these days means that this issue would rarely be observed. (Similar to 32-bit testing...)

Metadata

Metadata

Assignees

Labels

bugThe problem described is something that must be fixed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions