Skip to content

feat: add to_lists_of_records and to_record_of_lists#3582

Closed
nikoladze wants to merge 14 commits intoscikit-hep:mainfrom
nikoladze:nikoladze/add-ak-to-lists-of-records-and-to-record-of-lists
Closed

feat: add to_lists_of_records and to_record_of_lists#3582
nikoladze wants to merge 14 commits intoscikit-hep:mainfrom
nikoladze:nikoladze/add-ak-to-lists-of-records-and-to-record-of-lists

Conversation

@nikoladze
Copy link
Copy Markdown
Contributor

This PR adds new functions that effectively behave like ak.zip and ak.unzip, but operate on/return arrays. See #1043 for a related discussion. Originally @jpivarski suggested to choose "to/from" as function names, but i decided in the end to name both functions as "to_XXX" since the "from_..." one confused me a lot and i felt i always had an extra step in my head translating this into what the result of the operation is.

Both functions use ak.transform.

For to_lists_of_records this is used to find the record which is potentially already located inside lists. This leads to the behavior that depth_limit now starts counting from the depth where the record is located. An alternative would have been to not use ak.transform, since ak.unzip (that is used internally) also works for records that are not at the top level. However, when trying to make the distinction here between tuples and records with named fields i found that is_tuple seems to work for lists containing records, but not is_record so i was not quite sure if it would be well behaved to use this.
The recursion stops once a record is found. I thought about adding an option to run recursively through nested records (via continuation(), but decided against it for now since it would take some more time to think about and test how it should behave (e.g. one could think about also applying ak.zip with maximum possible depth_limit for the records in this case).

For to_record_of_lists ak.transform is used to find the list at the specified depth (axis argument). An exception is raised if no list is found to help users that picked the wrong axis.

@nikoladze nikoladze force-pushed the nikoladze/add-ak-to-lists-of-records-and-to-record-of-lists branch from 0c23e46 to a804b37 Compare July 20, 2025 09:42
@nikoladze nikoladze force-pushed the nikoladze/add-ak-to-lists-of-records-and-to-record-of-lists branch from a804b37 to 835ed59 Compare July 20, 2025 09:43
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 20, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.52%. Comparing base (b749e49) to head (fd4a167).
⚠️ Report is 424 commits behind head on main.

Additional details and impacted files
Files with missing lines Coverage Δ
src/awkward/operations/__init__.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_to_list_of_records.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_to_record_of_lists.py 100.00% <100.00%> (ø)
src/awkward/operations/ak_unzip.py 96.15% <ø> (ø)
src/awkward/operations/ak_zip.py 96.00% <ø> (+0.44%) ⬆️

... and 192 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikoladze - This looks great! Thanks for implementing this! The tests pass. Please merge it if you finished with the PR. Thanks.

return _impl(array, depth_limit, highlevel, behavior, attrs, **kwargs)


def _impl(array, depth_limit, highlevel, behavior, attrs, **zip_kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nikoladze,
The implementation should follow the style of the other highlevel operations, i.e, not using other highlevel operations (instead use their _impl) and it should use the HighLevelContext, see eg: https://github.com/scikit-hep/awkward/blob/main/src/awkward/operations/ak_with_name.py#L48-L65

Same for the other highlevel operation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, in case we continue with this (which i'm not so convinced anymore now, see below) i can fix this. Out of interest: what is the reasoning for this? Is it performance (e.g. avoiding some overhead function calls)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikoladze - high-level operations in Awkward are typically just wrappers around lower-level _impl functions. They often add:

  • Input validation
  • Type checking
  • Broadcasting
  • Optional behaviors like with_name propagation

If you call a high-level function from another high-level function:

  • You do redundant checks/validations
  • You can create nested overhead and slower code
  • It can introduce subtle inconsistencies in how arguments are handled

By calling the _impl function instead, you directly use the core logic without the extra layers, ensuring:

  • Consistent behavior with other operations
  • Minimal overhead
  • Easier reasoning about what the function actually does

HighLevelContext is used to handle:

  • Wrapping/unwrapping arrays between high-level (ak.Array) and low-level (Content) representations
  • Preserving metadata like with_name or behavior
  • Ensuring the operation works consistently across different array types

Example from the link @pfackeldey shared:

def with_name(array, name, highlevel=True, behavior=None):
    ...
    with ak._util.OperationErrorContext(...):
        out = _impl(array.layout, name, behavior)
    return wrap(out, behavior=behavior, highlevel=highlevel)

Here:

  • _impl does the core work on the layout
  • wrap + HighLevelContext ensures the high-level array is returned with proper behavior/metadata

@pfackeldey
Copy link
Copy Markdown
Collaborator

Some thoughts/comments on both functions:

to_lists_of_records:
I understand that this operation is something that's often done. I'd naively do ak.zip(dict(zip(ak.fields(a), ak.unzip(a)))), which is not much more chars than ak.to_lists_of_records(a). I was thinking that maybe we should rather add a kwarg to ak.unzip, maybe similar to uproot's how= kwarg? That would be then: ak.zip(ak.unzip(a, how=dict)), which I'd personally prefer over introducing a new highlevel operation. (this would not allow to start depth_limit at the level of the record, but I also don't know when someone would want to use this?)

to_record_of_lists:
That's more niche I'd say. I'm not sure yet what the use-cases are, but given that it's more complex to implement it, I think it makes sense to have a dedicated highlevel operation for it. I think I'm just confused by the axis argument here, shouldn't that just be depth_limit?

@nikoladze
Copy link
Copy Markdown
Contributor Author

mhh - i like the idea of a how=dict argument to ak.unzip - in #1043 @jpivarski said it's to late to change ak.unzip now, but that would only be the addition of an optional keyword argument, so no breaking change. If you prefer this i think that would be a viable option. And maybe we don't really need to_record_of_lists - tbh i also can't really come up with a use-case for this ...

I think I'm just confused by the axis argument here, shouldn't that just be depth_limit?

you're right - instead of ak.unzip(...) at a certain depth (given by axis) i could also have done ak.zip(ak.unzip(...)) at the top level and give a depth_limit to ak.zip. Maybe that would have been easier. So if we go for the variant with having how=dict an alternative to to_record_of_lists would be actually the same as the alternative to to_lists_of_records - ak.zip(ak.unzip(array, how=dict)) - but passing a depth_limit to ak.zip

ok, so given all these things maybe we really should just implement ak.zip(..., how=dict) instead?

@pfackeldey
Copy link
Copy Markdown
Collaborator

ok, so given all these things maybe we really should just implement ak.zip(..., how=dict) instead?

I would personally be in favor of this. Let's see what @ianna says.

@ianna
Copy link
Copy Markdown
Member

ianna commented Jul 22, 2025

ok, so given all these things maybe we really should just implement ak.zip(..., how=dict) instead?

I would personally be in favor of this. Let's see what @ianna says.

It sounds like a good idea! Thanks.

@ianna ianna self-requested a review July 22, 2025 18:27
@ianna ianna added the pr-next-release Required for the next release label Jul 24, 2025
@ianna
Copy link
Copy Markdown
Member

ianna commented Jul 30, 2025

@nikoladze - we have discussed this with @pfackeldey and decided to postpone integration of this PR until the next release. The reason for this is that the release is due now.

@ianna ianna removed the pr-next-release Required for the next release label Jul 30, 2025
@ianna
Copy link
Copy Markdown
Member

ianna commented Sep 18, 2025

closing in favor of #3655

@ianna ianna closed this Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants