Skip to content

Require PyArrow >=16#12258

Merged
jrbourbeau merged 5 commits intodask:mainfrom
crusaderky:pyarrow16
Jan 29, 2026
Merged

Require PyArrow >=16#12258
jrbourbeau merged 5 commits intodask:mainfrom
crusaderky:pyarrow16

Conversation

@crusaderky
Copy link
Copy Markdown
Collaborator

@crusaderky crusaderky commented Jan 29, 2026

PyArrow 14 and 15 are incompatible with Pandas 3 when it comes to distributed shuffle.
I must confess I was lazy, did not investigate the actual issue, and instead simply bumped up the minimum version. This also gave the opportunity to clean up a lot of special cases in the codebase.

An alternative option was to pin pandas<3 in the 3.11 environment and add a runtime check for the specific combination of the two dependencies. I discarded it for the sake of simplicity.

This change is respectful of SPEC 0.

PyArrow 14 and 15 are incompatible with Pandas 3
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 29, 2026

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
0 files   ±0   0 ❌ ±0 

Results for commit 3f597fc. ± Comparison against base commit ff0ca1a.

♻️ This comment has been updated with latest results.

- dask-ml=1.4.0
- fastavro=1.4.7
- h5py=3.4.0
- h5py=3.7.0
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These very old versions of h5py, snappy and tiledb had resolution conflicts on zlib and libopenssl

- matplotlib=3.5.0
- mimesis=5.3.0
- mmh3=3.0.0
- numba=0.59.1
Copy link
Copy Markdown
Collaborator Author

@crusaderky crusaderky Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the de facto pin that was in place before.
Odd that it wasn't pinned and doesn't appear in the documentation, as there are explicit tokenizers for it.

@crusaderky crusaderky marked this pull request as ready for review January 29, 2026 18:30
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @crusaderky -- LGTM if CI is happy

@jrbourbeau jrbourbeau merged commit a171210 into dask:main Jan 29, 2026
28 checks passed
@crusaderky crusaderky deleted the pyarrow16 branch January 30, 2026 09:35
weiji14 added a commit to regro-cf-autotick-bot/icepyx-feedstock that referenced this pull request Mar 16, 2026
Xref dask/dask#12258. Remove this workaround after pyarrow is a necessary dependency.
JessicaS11 pushed a commit to conda-forge/icepyx-feedstock that referenced this pull request Mar 17, 2026
* updated v2.0.2

* MNT: Re-rendered with conda-smithy 3.56.3 and conda-forge-pinning 2026.03.16.15.32.2

* Put quotes around context.python_min, set test.python_version properly

* Do manual import checks with dask<2026.1.2 to avoid pyarrow requirement

Xref dask/dask#12258. Remove this workaround after pyarrow is a necessary dependency.

* Pin to holoviews>=1.13.0 for python 3.11 compatibility

Xref conda-forge/holoviews-feedstock#120

---------

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pandas 3 + PyArrow 14 breaks DataFrame shuffle: RuntimeError: P2P failed during barrier phase

4 participants