feat: add to_lists_of_records and to_record_of_lists#3582
feat: add to_lists_of_records and to_record_of_lists#3582nikoladze wants to merge 14 commits intoscikit-hep:mainfrom
to_lists_of_records and to_record_of_lists#3582Conversation
0c23e46 to
a804b37
Compare
a804b37 to
835ed59
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
ianna
left a comment
There was a problem hiding this comment.
@nikoladze - This looks great! Thanks for implementing this! The tests pass. Please merge it if you finished with the PR. Thanks.
| return _impl(array, depth_limit, highlevel, behavior, attrs, **kwargs) | ||
|
|
||
|
|
||
| def _impl(array, depth_limit, highlevel, behavior, attrs, **zip_kwargs): |
There was a problem hiding this comment.
Hi @nikoladze,
The implementation should follow the style of the other highlevel operations, i.e, not using other highlevel operations (instead use their _impl) and it should use the HighLevelContext, see eg: https://github.com/scikit-hep/awkward/blob/main/src/awkward/operations/ak_with_name.py#L48-L65
Same for the other highlevel operation.
There was a problem hiding this comment.
ok, in case we continue with this (which i'm not so convinced anymore now, see below) i can fix this. Out of interest: what is the reasoning for this? Is it performance (e.g. avoiding some overhead function calls)?
There was a problem hiding this comment.
@nikoladze - high-level operations in Awkward are typically just wrappers around lower-level _impl functions. They often add:
- Input validation
- Type checking
- Broadcasting
- Optional behaviors like
with_namepropagation
If you call a high-level function from another high-level function:
- You do redundant checks/validations
- You can create nested overhead and slower code
- It can introduce subtle inconsistencies in how arguments are handled
By calling the _impl function instead, you directly use the core logic without the extra layers, ensuring:
- Consistent behavior with other operations
- Minimal overhead
- Easier reasoning about what the function actually does
HighLevelContext is used to handle:
- Wrapping/unwrapping arrays between high-level (
ak.Array) and low-level (Content) representations - Preserving metadata like
with_nameorbehavior - Ensuring the operation works consistently across different array types
Example from the link @pfackeldey shared:
def with_name(array, name, highlevel=True, behavior=None):
...
with ak._util.OperationErrorContext(...):
out = _impl(array.layout, name, behavior)
return wrap(out, behavior=behavior, highlevel=highlevel)Here:
_impldoes the core work on the layoutwrap+HighLevelContextensures the high-level array is returned with proper behavior/metadata
|
Some thoughts/comments on both functions:
|
|
mhh - i like the idea of a
you're right - instead of ok, so given all these things maybe we really should just implement |
I would personally be in favor of this. Let's see what @ianna says. |
It sounds like a good idea! Thanks. |
|
@nikoladze - we have discussed this with @pfackeldey and decided to postpone integration of this PR until the next release. The reason for this is that the release is due now. |
|
closing in favor of #3655 |
This PR adds new functions that effectively behave like
ak.zipandak.unzip, but operate on/return arrays. See #1043 for a related discussion. Originally @jpivarski suggested to choose "to/from" as function names, but i decided in the end to name both functions as "to_XXX" since the "from_..." one confused me a lot and i felt i always had an extra step in my head translating this into what the result of the operation is.Both functions use
ak.transform.For
to_lists_of_recordsthis is used to find the record which is potentially already located inside lists. This leads to the behavior thatdepth_limitnow starts counting from the depth where the record is located. An alternative would have been to not useak.transform, sinceak.unzip(that is used internally) also works for records that are not at the top level. However, when trying to make the distinction here between tuples and records with named fields i found thatis_tupleseems to work for lists containing records, but notis_recordso i was not quite sure if it would be well behaved to use this.The recursion stops once a record is found. I thought about adding an option to run recursively through nested records (via
continuation(), but decided against it for now since it would take some more time to think about and test how it should behave (e.g. one could think about also applyingak.zipwith maximum possibledepth_limitfor the records in this case).For
to_record_of_listsak.transformis used to find the list at the specified depth (axisargument). An exception is raised if no list is found to help users that picked the wrong axis.