Skip to content

Array class validation #1483

@agoose77

Description

@agoose77

Description of new feature

This issue was originally very long, but the fundamental motivation is quite simple. Right now, behaviour classes cannot perform validation at construction time; the only opportunity for validating the current layout is whenever a user-defined method is called, e.g.:

class OnlyFloats(Array):
    def validate(self):
        dtypes = []
        def get_dtype(layout, **kwargs):
            if isinstance(layout, ak._v2.contents.NumpyArray):
                assert np.issubdtype(layout.dtype, np.floating)
        self.layout.recursively_apply(get_dtype)

ak.behavior["only_floats"] = OnlyFloats
>>> x = ak.with_parameter(Array([1,2,3]), "__array__",  "only_floats")
>>> x.validate()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_362562/672691759.py in <cell line: 1>()
----> 1 validate(ak._v2.Array(l))
...

/tmp/ipykernel_362562/3059306649.py in get_dtype(layout, **kwargs)
      3     def get_dtype(layout, **kwargs):
      4         if isinstance(layout, ak._v2.contents.NumpyArray):
----> 5             assert np.issubdtype(layout.dtype, np.floating)
      6     self.layout.recursively_apply(get_dtype)

AssertionError:

This means that a user could associate a behaviour with a poorly structured array, and only discover their error at a later point in the program when the validation method is called.

I wonder if we should consider adding a hook that allows behaviours to run user-defined code at the time of array wrapping, e.g. to validate the array:

class Array(...):
    def __init__(self, ...):
        self._on_layout_added()

    def _on_layout_added(self):
        pass


class OnlyFloats(Array):
    def _on_layout_added(self):
        dtypes = []
        def get_dtype(layout, **kwargs):
            if isinstance(layout, ak._v2.contents.NumpyArray):
                assert np.issubdtype(layout.dtype, np.floating)
        self.layout.recursively_apply(get_dtype)

ak.behavior["only_floats"] = OnlyFloats


x = ak.with_parameter(Array([1,2,3]), "__array__",  "only_floats")

This would mean that we don't have to come up with some Awkward-specific validation tool: this can be left up to the library authors.

The caveat here is that I don't think we want this validation step to do anything too magical - ideally the mental model of behaviours as a mechanism to attach methods to classes holds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions