Skip to content

chore: bump pyarrow to unlock python 3.14 support#707

Merged
mikeknep merged 2 commits into
mainfrom
mike/673-python-314-support
May 27, 2026
Merged

chore: bump pyarrow to unlock python 3.14 support#707
mikeknep merged 2 commits into
mainfrom
mike/673-python-314-support

Conversation

@mikeknep

Copy link
Copy Markdown
Contributor

📋 Summary

Bumps the pyarrow dep to >=22,<23 (previously pinned to 19.x). Pyarrow 22 is the first release with Python 3.14 wheels

🔗 Related Issue

Closes #673

🔄 Changes

  • pyproject.toml files and uv.lock
  • CI version matrices

🧪 Testing

  • make test passes
  • Unit tests added/updated Mike: sort of; CI now includes 3.14
  • E2E tests added/updated (if applicable) Mike: same as above

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable) N/A

Additional note

The original issue suggested a conservative approach where Python <3.14 would continue using pyarrow 19.x while only Python 3.14 would use pyarrow 22. This seemed annoying to deal with, so had another agent assess whether it was necessary and they decided no; full agent output is below.

Verdict: pyarrow 22 across the board is plausible

Test results:

  • Unit tests on Python 3.10, 3.11, 3.12, 3.13, 3.14: all 3766 passed, 1 skipped with pyarrow 22.0.0.
  • E2E tests on Python 3.10 and 3.14: all 6 passed, 2 skipped.
  • format-check and lint: clean.

What this confirms: The runtime/behavioral risk in pyarrow 20-22 doesn't manifest in DataDesigner's parquet I/O paths. The codebase uses stable APIs that didn't change.

What this does NOT eliminate: The packaging risk remains. Anyone on a pre-glibc-2.28 Linux distro (CentOS 7, RHEL 7, Ubuntu 18.04 and older — all EOL'd) who could install the previous version will now hit a wheel-not-found situation and either need to upgrade their distro or (more likely) fail to install. That's not something CI catches; it would surface as user reports.

My recommendation: Go with the simpler single-pin approach. The complexity savings are real (one dependency, one CI matrix exercising the same pyarrow everywhere, no maintenance of the version-marker line as we revisit pyarrow upper bounds), and the user-impact risk is bounded: anyone affected is on EOL'd Linux. If the project later gets a bug report from such a user, reverting to a split pin in a patch release is straightforward.

mikeknep added 2 commits May 27, 2026 09:33
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
@mikeknep mikeknep requested a review from a team as a code owner May 27, 2026 14:41
@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR bumps pyarrow from >=19.0.1,<20 to >=22,<23 to unlock Python 3.14 wheel support, and widens the CI matrix and package classifiers accordingly. The author ran full unit and E2E test suites across all supported Python versions with pyarrow 22 and confirmed no behavioral regressions in DataDesigner's parquet I/O paths.

  • packages/data-designer-config/pyproject.toml and uv.lock carry the actual dependency change; the other two packages only add the Python 3.14 classifier.
  • .github/workflows/ci.yml adds "3.14" to every job's version matrix, and Makefile/SKILL.md docs are updated to remove the now-stale "pyarrow lacks 3.14 wheels" notes.

Confidence Score: 5/5

Safe to merge — the change is a well-scoped dependency bump with full test coverage across all target Python versions.

The pyarrow upgrade is backed by agent-run test results covering 3766 unit tests and 6 E2E tests on Python 3.10–3.14 with pyarrow 22. The lock file is consistent with the new specifier, all three packages have their classifiers updated, and the CI matrix correctly reflects the expanded support range. The only known downside (no wheels for pre-glibc-2.28 Linux) is explicitly acknowledged in the PR description and is bounded to EOL distros.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-config/pyproject.toml pyarrow bumped from >=19.0.1,<20 to >=22,<23; Python 3.14 classifier added. Lock file matches.
.github/workflows/ci.yml Python 3.14 added to all five job matrices; no other logic changed.
uv.lock pyarrow entry updated from 19.0.1 to 22.0.0 with full wheel list including cp314 wheels; specifier matches pyproject.toml.
Makefile Comment for DOCS_PYTHON_VERSION updated to remove stale pyarrow 3.14 rationale; default stays at 3.13.
.agents/skills/datadesigner-docs/SKILL.md Two doc lines updated: notebook-deps comment and troubleshooting table row both de-reference the now-resolved 3.14/pyarrow constraint.
README.md Python version badge updated from 3.10–3.13 to 3.10–3.14.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Install DataDesigner] --> B{Python version?}
    B -->|3.10 to 3.14| C[Resolve pyarrow 22.x]
    C --> D{Platform?}
    D -->|Linux glibc 2.28+ / macOS / Windows| E[Pre-built wheel available]
    D -->|Linux glibc less than 2.28 - EOL distros| F[No wheel - install fails]
    E --> G[CI: ubuntu-latest + macos-latest, Python 3.10 3.11 3.12 3.13 3.14]
    G --> H[3766 unit tests pass]
Loading

Reviews (1): Last reviewed commit: "Simplify pyarrow dep" | Re-trigger Greptile

@github-actions

Copy link
Copy Markdown
Contributor

Review: PR #707 — chore: bump pyarrow to unlock python 3.14 support

Summary

Bumps pyarrow from >=19.0.1,<20 to >=22,<23 across all three packages to unlock Python 3.14 support (3.14 wheels first ship in pyarrow 22). Adds 3.14 to the CI matrices, the package classifiers, and the README badge. Updates the docs Makefile comment and the datadesigner-docs skill notes to reflect that pyarrow no longer blocks 3.14, but the docs build still defaults to 3.13. Net change: +69/−51, mostly uv.lock churn.

Findings

Correctness

  • Major-version skip (19 → 22) is acknowledged but bounded. The PR description documents an out-of-band validation pass (3,766 unit tests on Python 3.10–3.14, 6 E2E on 3.10/3.14, lint/format clean). Spot-checking pyarrow usage in the repo confirms the surface is stable: pd.read_parquet(..., dtype_backend="pyarrow") (packages/data-designer-config/src/data_designer/config/utils/io_helpers.py:131), pa.ListType / pa.StructType / pa.DataType introspection (packages/data-designer-engine/src/data_designer/engine/analysis/utils/column_statistics_calculations.py:163), and a single comment in artifact_storage.py. None of these call into the parts of pyarrow that changed across 20–22. Reasonable risk profile for a single-pin upgrade.
  • glibc-2.28 packaging risk. Pyarrow 22 wheels drop pre-glibc-2.28 manylinux variants (the lockfile now lists manylinux_2_28_* and musllinux_1_2_* only). Anyone on CentOS 7 / RHEL 7 / Ubuntu 18.04 (all EOL) will hit a wheel-not-found situation. The PR description owns this trade-off; worth surfacing in the changelog/release notes when this lands.
  • CI matrix is consistent. All five python-version matrices in .github/workflows/ci.yml (config, engine, interface, e2e, summary Test) gain "3.14". The summary job at line 271 is the gating one, and it's kept in sync — good. The single-python jobs (Coverage Check 3.11, health-checks 3.11, check-colab 3.11) are intentionally untouched.
  • Classifiers added to all three packages, in the right place (packages/data-designer-config/pyproject.toml:19, data-designer-engine/pyproject.toml:19, data-designer/pyproject.toml:20). README badge updated to match.

Style / conventions

  • Specifier formatting: pyarrow>=22,<23 is slightly looser than the patch-pinned form used elsewhere (e.g., pillow>=12.2.0,<13, pydantic[email]>=2.9.2,<3). The repo is already inconsistent here, so this is a nit, not a blocker — but >=22.0.0,<23 would match the dominant pattern.
  • Makefile comment rewrite (Makefile:467-470) is a clear improvement: reframes DOCS_PYTHON_VERSION ?= 3.13 as "match the published docs builds" rather than the now-obsolete "no 3.14 wheels" reasoning. The skill update at .agents/skills/datadesigner-docs/SKILL.md:378 mirrors it.
  • Troubleshooting line at .agents/skills/datadesigner-docs/SKILL.md:456 is rephrased to be version-agnostic ("an interpreter without prebuilt pyarrow wheels"), which is forward-compatible — good. With pyarrow 22 covering 3.14, this row only kicks in for 3.15+, but the wording stays correct.

Tests

  • No new tests added, which is appropriate for a dep bump. The functional safety net is the existing CI matrix now exercising 3.14 on every job. The author's pre-merge validation (3766 passed, 1 skipped × 5 Python versions) provides additional coverage signal.
  • Worth confirming on the actual CI run that the new 3.14 matrix cells go green before merge — the local validation referenced in the PR body is reassuring but doesn't substitute for the in-tree CI.

Security / supply chain

  • pyarrow 22.0.0 (released 2025-10-24) is the current upstream stable; no known CVEs missed by this bump.
  • No new transitive deps surface in the lockfile diff beyond pyarrow itself (the uv.lock change is wheel rows for the pyarrow package only).

Suggestions

  1. (Nit) Tighten the version specifier to pyarrow>=22.0.0,<23 to match neighboring deps in data-designer-config/pyproject.toml.
  2. (Optional) When this lands, mention the dropped pre-glibc-2.28 wheel coverage in the release notes / CHANGELOG so users on EOL distros have a heads-up.
  3. (Optional) Consider whether the docs build pin (DOCS_PYTHON_VERSION ?= 3.13) should bump to 3.14 in a follow-up now that the pyarrow blocker is gone, or stay on 3.13 for stability. The new comment leaves this implicit; an inline note about the policy would help future readers.

Verdict

Approve with minor suggestions. Clean, well-scoped dependency bump. The diff stays in dep/CI/docs files only, the import-direction and structural invariants from AGENTS.md are untouched, and the major-version skip is backed by a documented validation pass. The glibc-2.28 trade-off is real but bounded to EOL distros and explicitly accepted in the PR body. Suggestions above are non-blocking.

@johnnygreco

Copy link
Copy Markdown
Contributor

Nice work on this one, @mikeknep: this is a tidy dependency unlock with the right CI surface expanded.

Summary

This PR bumps DataDesigner's parquet dependency to pyarrow>=22,<23, adds Python 3.14 classifiers, updates the CI unit/e2e matrices, and refreshes docs wording that previously assumed pyarrow lacked 3.14 wheels. The implementation matches the PR description, including the simpler single-pin approach across supported Python versions.

I also ran an extra isolated validation pass for the pyarrow 22 risk: Python 3.14.4 and 3.10.12 e2e envs both installed pyarrow 22.0.0; the repo e2e suite passed on both (6 passed, 2 skipped each, with API-key-gated tests skipped), and a parquet smoke covering create(), load_dataset(), metadata row counts, parquet/csv/jsonl export, and parquet seed ingestion passed on both interpreters.

Findings

No findings.

What Looks Good

  • The dependency bump lives in data-designer-config, which keeps the existing package layering intact while engine/interface inherit it through their declared package dependencies.
  • Expanding both package tests and e2e tests to Python 3.14 gives us good coverage for the exact wheel/install problem this PR is solving.
  • The docs wording now explains why docs default to 3.13 without preserving the stale “pyarrow has no 3.14 wheels” claim.

Verdict

Ship it.


This review was generated by an AI assistant.

@johnnygreco johnnygreco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. The pyarrow 22 + Python 3.14 changes look good, and the extra isolated e2e/parquet smoke coverage passed.

@mikeknep mikeknep merged commit 87daeba into main May 27, 2026
62 checks passed
@mikeknep mikeknep deleted the mike/673-python-314-support branch May 27, 2026 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Python 3.14 (via bumping or expanding pyarrow dep)

2 participants