feat: named axis for ak.Array#3238
Conversation
ak.Arrayak.Array
Progressgeneral
broadcasting
slicing
Unary and binary operations
high-level functionsNew:
Can be used with named axis:
Independent of named axis: improvements / bugs found that are fixed by this PR aswell:
|
|
And all the data types that can be passed into square brackets with |
…xing named axis propagation
…x named axis propagation in indexing for type tracers
…ak.mean; remove inplace addition of arrays from test
…tible with branched structures;fix regularize_axis in all highlevel ops
…ranch_depth using inner_shape property
…at depended on broadcasting to work)
|
The PR is ready for review @agoose77 and @jpivarski. If you know why the windows tests are failing, please let me know if you have an idea... (I thought I'm sorry that the PR got so huge, but the largest part are tests related to named axes and translating named axes to positional axes in each high-level function. |
|
@pfackeldey the |
|
I'd like to see the rendered documentation, but the Deploy Branch Preview jobs are getting skipped. I don't know why: they were running PRs relatively recently. Meanwhile, I'm still going over the rest of the PR. Edit: It's because this is a fork, not a branch. That's okay; the markdown looks good; we'll see the rendered version after merging. |
jpivarski
left a comment
There was a problem hiding this comment.
It's hard to review such a large PR, but we went over it at length during development. I think we iterated much more quickly because of that, and so I don't have much to say here, on the PR itself. The spot-check issues below are all minor. Still, we should check in again before the final merger, since everybody working on the codebase needs to be ready to incorporate the update when it comes.
|
Oh, and I should have mentioned that this is very high quality code (type annotations, docstrings, comments)! Thank you! |
|
Thank you very much for your review @jpivarski! I'll add your suggestions soon 👍 |
agoose77
left a comment
There was a problem hiding this comment.
A small review -- I'll pop back tomorrow.
Proposal for named axis
This PR addresses #2596.
References for other named axis implementations:
Motivation
As argumented at PyHEP.dev 2023 and by the Harvard NLP group in their "Tensor Considered Harmful" write-up, named axis can be a powerful tool to make code more readable and less error-prone.
Design
ak.Arraywith named axisNamed axis are implemented through a mapping from named axis to positional axis.
named axis are hashables (currently restricted to strings), except for integers (and None) as they are reserved for positional axis.
By default an
ak.Arrayuses positional axis, but named axis can be added to the array in the following ways:The
named_axisargument of the constructor of anak.Arrayis a either tuple ofAxisNameor a dict ofAxisNameto integers.It is stored in the
.attrsattribute of the array with a reserved key"__named_axis__"of typedict[AxisName, int].The two types of axis can be accessed through the
named_axisandpositional_axisproperty (always represented as a tuple):Named axis in high-level functions
Named axis can be used by all high-level functions, e.g.
ak.sum,ak.max, etc.:There are different scenarios how named axis are propagated to the resulting array:
ak.sum(array, axis="jets", keepdims=True)orarray ** 2.ak.sum(array, axis="jets").ak.Arrayor broadcasting:Here, checks for matching named axis are performed, the rules are:
Named axis in indexing
In addition, named axis can be used to select data:
For synthatic sugar use
np.s_:This PR has to touch a lot of code and needs to add custom named axis propagation to each high-level operation. Thus, this PR is currently in draft mode.
Looking forward to ideas, thoughts, feedback on this effort!