Skip to content

fix: update string to search for in determing parquet column list separator#2670

Merged
jpivarski merged 1 commit intoscikit-hep:mainfrom
douglasdavis:fix-parquet-item-vs-element
Aug 24, 2023
Merged

fix: update string to search for in determing parquet column list separator#2670
jpivarski merged 1 commit intoscikit-hep:mainfrom
douglasdavis:fix-parquet-item-vs-element

Conversation

@douglasdavis
Copy link
Copy Markdown
Contributor

@douglasdavis douglasdavis commented Aug 24, 2023

With pyarrow 13 it appears that list.element is used over list.item in some datasets. Looks like our original check to look for list.element was too strict (the final .). We also need to make this change in dask-awkward!

more info at dask-contrib/dask-awkward#346

@jpivarski
Copy link
Copy Markdown
Member

This distinction between item and element was a difference in convention between Arrow and Spark (and Spark was more technically correct: there was some third party that said that element is what should be used). This could be a part of a transition toward conformity.

Anyway, these are the only two names that need to be supported.

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 24, 2023

Codecov Report

Merging #2670 (aced13b) into main (12348fb) will not change coverage.
The diff coverage is n/a.

Additional details and impacted files
Files Changed Coverage Δ
src/awkward/operations/ak_from_parquet.py 91.17% <ø> (ø)

@agoose77
Copy link
Copy Markdown
Collaborator

I am happy to merge this, unless @jpivarski has objections, I'll do so this evening.

Copy link
Copy Markdown
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm back at a computer, rather than a phone, I noticed that this is a PR on Awkward, not dask-awkward.

The change appears to be all-inclusive. If ".list.element" in column_metadata.path is needed to match new Arrow arrays and ".list.element." in column_metadata.path was used before, the new predicate will match both. Also, it couldn't spuriously match something it's not intended to.

So I sign off on it as well, and since we're all in agreement, I'll merge it, too.

@jpivarski jpivarski merged commit 6de30f1 into scikit-hep:main Aug 24, 2023
@douglasdavis douglasdavis deleted the fix-parquet-item-vs-element branch August 24, 2023 21:07
jpivarski added a commit that referenced this pull request Oct 4, 2023
…rame (#2735)

* fix: backport numba old to new style error capturing to avoid warnings

* backporting #2617

* backporting #2670 and tests that depend on Parquet list separator

* style: pre-commit fixes

* backporting #2617 for the _v2 subdirectory

* RDataFrame handling is broken in _v2 subdirectory; don't bother fixing because it's correct in the real Awkward 2.x

* The next awkward 1.x backport will be version 1.10.5.

* Windows can't build ('x86' vs 'x86_64'). Maybe use the old VM?

* No, change that back.

* Remove all v2 tests.

* I was changing the wrong workflow file. Let's try new Windows.

* What I really need to test is the wheel-building.

* Try again.

* On second thought, don't upload to PyPI.

* Remove 32-bit Windows from build-test.yml and modernize wheels.yml.

---------

Co-authored-by: Jim Pivarski <jpivarski@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants