WIP: arrow 10.0.0 #866

h-vetinari · 2022-09-23T18:51:41Z

PPC builds in apache/arrow#14102 are failing.

Edit: new version dropped, repurposing this PR as it had almost all the required bits already there.

conda-forge-linter · 2022-09-23T18:51:46Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

h-vetinari · 2022-09-23T18:52:01Z

@conda-forge-admin, please rerender

LLVM headers needed by gandiva; clang needed because FindLLVMAlt.cmake depends on it to find LLVM correctly. Co-Authored-By: Sutou Kouhei <kou@clear-code.com>

…nda-forge-pinning 2022.10.21.19.23.23

h-vetinari · 2022-10-22T06:27:42Z

Looks like the PPC failures are gone! 🥳

Windows has some issues with a symlink, but 2ce51ad can be removed as soon as there's a release dist.

…nda-forge-pinning 2022.10.23.00.15.36

h-vetinari · 2022-10-23T13:06:04Z

@conda-forge/arrow-cpp @jakirkham @jaimergp @isuruf @kkraus14

TL;DR: Due to recent changes, we need to do something about emulation builds here

Proposal: for aarch/ppc pyarrow builds, revive pyarrow-feedstock or use separate branch here

First off: Amazing news, arrow-cpp is not depending on python anymore 🥳

Positives:
- Way less CI jobs (one per aarch, including all python versions)
- Way less build artefacts for arrow-cpp
Negatives:
- Consolidated builds impossible in emulation

The aarch64/ppc64le builds are already timing out about 50% in emulation, so adding more outputs to the same job is completely impossible. I can imagine four possible ways to fix this AFAICT (sorted by preference, ascending):

Spuriously keep arrow-cpp depending on python for aarch/ppc (still running into ~50% timeouts; 16 CI jobs here instead of 2, more once PyPy arrives). This is the status quo but hella ugly/work-intensive, and unnecessarily multiplies artefacts/traffic.
Revive the pyarrow feedstock only for aarch/ppc pyarrow builds
- Keep the arrow-cpp build here in emulation (happy side effect of reduced timeout risk)
- Synchronization through submodule or manually
- Having faster iteration speeds on this feedstock, and - once passing - doing the aarch/ppc side doesn't sound too bad as a workflow IMO (as the aarch/ppc builds - especially for only pyarrow - rarely cause issues, but blow up CI time).
Do the same as 2. but without touching the old pyarrow-feedstock
- have a branch (per version >=10...) in this feedstock that just has one commit...
- ...which removes the skips for pyarrow on aarch/ppc that'd have to be on main and adds skips everywhere else
- rebase that branch whenever a PR is merged to main (aarch/ppc pyarrow builds get published from that branch)
- same benefits as 2., but less synchronization hassles
- as a downside, this would require force-pushing to a productive branch (after each rebase)
Enable cross-compilation with CUDA (open question how/whether that's possible, see some notes from the core sync; wasn't discussed though)

Obviously 4. would be nicest, but because I have no idea how long that's going to take to become possible, I'd like to proceed with 3. or 2.

PS. If someone has a better way than 1f17385 to keep conda smithy from generating jobs for different numpy versions, I'm all ears!

jaimergp · 2022-10-23T14:25:24Z

Hm, and what about some conda_build_config.yaml magic plus Jinja logic to have one output per job in those troublesome archs? The idea is to have two jobs for ppc/aarch so it doesn't timeout, instead of using two branches or feedstocks.

h-vetinari · 2022-10-23T20:53:52Z

Hm, and what about some conda_build_config.yaml magic plus Jinja logic to have one output per job in those troublesome archs?

That would be a similar situation as until now - 16 jobs (once we include PPC+CUDA) for aarch/ppc, building pyarrow and arrow-cpp (the latter redundantly for 3/4 jobs per aarch), and those jobs currently time out 50% of the time.

That means it would take 5-6(!) restarts of 6h jobs to eventually get a passing run, which is just a spectacular hassle that I'd like to avoid going forward.

h-vetinari · 2022-10-24T00:55:56Z

[1. implies] 16 jobs (once we include PPC+CUDA) for aarch/ppc, building pyarrow and arrow-cpp

Actually, now that OpenSSL 3 has been unblocked, we'd have 32 jobs only for aarch/ppc (and that's not even counting another 16 if we get PyPy); as the person doing most of the maintenance here at the moment, that makes me object strongly to "1." above - even though I try to do the restarts, sometimes the GH-Azure interaction will not let me do it anymore after 3-4, and since we'd need 6-8 restarts to realistically finish all 32-48 timeout-prone jobs, this would mean missing aarch/ppc pyarrow builds for every PR (and attendant resolution problems). Comparatively, it's way less hassle to do option 2. or 3.

h-vetinari · 2022-10-24T21:24:31Z

Hm, and what about some conda_build_config.yaml magic plus Jinja logic to have one output per job in those troublesome archs?

That would be a similar situation as until now - 16 jobs (once we include PPC+CUDA) for aarch/ppc, building pyarrow and arrow-cpp (the latter redundantly for 3/4 jobs per aarch), and those jobs currently time out 50% of the time.

That means it would take 5-6(!) restarts of 6h jobs to eventually get a passing run, which is just a spectacular hassle that I'd like to avoid going forward.

I can imagine four possible ways to fix this AFAICT (sorted by preference, ascending):

Actually, since upstream arrow runs the recipe as part of its CI (c.f. apache/arrow#14102), splitting things up into separate feedstocks (option 2.) does not sound like a good idea anymore. So my preference would now be 3.

Would be happy to have your input @xhochy @pitrou @kou

h-vetinari · 2022-10-27T04:15:27Z

@jakirkham @kkraus14 @isuruf @jaimergp
Any thoughts about the feasibility of cross-compiling cuda for aarch64/ppc64le (i.e. 4. above)? That would be by far the most elegant solution. If it'd be possible to get there within (say) a few months, it might even make sense to wait for that.

kkraus14 · 2022-10-27T04:20:56Z

@jakirkham @kkraus14 @isuruf @jaimergp
Any thoughts about the feasibility of cross-compiling cuda for aarch64/ppc64le (i.e. 4. above)? That would be by far the most elegant solution. If it'd be possible to get there within (say) a few months, it might even make sense to wait for that.

My understanding is there's currently no way to get a libcuda.so stub library into a conda package or a docker image that we can distribute that is compliant with the CUDA EULA.

h-vetinari · 2022-10-27T05:04:17Z

Thanks for the quick response @kkraus14!

CC @conda-forge/core: with upstream changes (plus openssl & pypy in the pipeline), and the inability to cross-compile for cuda, we need to split the builds here. More details further up.

Recapping our options, and adding two more:

Option	Benefits	Downsides	Comment
1. Keep status quo	- everything builds from one PR	- build explosion; - infeasible # of restarts - infeasible total CI runtime - resolver pitfalls	Infeasible (details)
2. Reactivate pyarrow-feedstock	- no time outs here (or there)	- separate feedstock - harder to sync upstream	[see variants]
-> 2a. ... just for aarch/ppc			feasible, but meh
-> 2b. ... for everything		- build explosion there (python x openssl x arch x cuda)	feasible, but ugh
3. Build pyarrow aarch/ppc in separate branch	- no time outs here (or there) - everything in one place	- double the productive branches	[see variants]
-> 3a. ... carrying unskip-commit		- needs force-push to prod. branch after rebase	meh
-> 3b. ... carrying env var			Least bad option? 🤩
4. cross-compile cuda	- everything builds from one PR		best solution, but infeasible 🥲

To detail option 3b, I'm envisioning a setup that doesn't need rebases, as follows:

create branch <ver>.0.x_aarch_ppc, and add a single commit that sets an (environment) variable only for aarch/ppc in conda_build_config.yaml (or somewhere else), say CF_BUILD_ONLY_PYARROW=1
use selectors # [CF_BUILD_ONLY_PYARROW == 1] to switch off pyarrow aarch/ppc builds on the main branches, resp. enable them on the _aarch_ppc branches
merge <ver>.0.x (rather than rebase) into <ver>.0.x_aarch_ppc after ever PR to <ver>.0.x.

kkraus14 · 2022-10-27T05:14:42Z

I'd propose a 5th option here:

Move aarch64 / ppc64le non-CUDA builds to cross compilation
Disable aarch64 / ppc64le CUDA builds until there's either native runners that we can get CUDA on or a way to ship the necessary libraries to support cross compilation, at which point we can revisit producing CUDA builds

h-vetinari · 2022-10-27T06:33:37Z

I'd propose a 5th option here:

Sure, if people are fine with not producing CUDA builds on aarch/ppc, that's even easier. I had just assumed that it would not be acceptable to remove a feature like that (which people spent a bunch of time on, from looking at the old PRs that enabled it).

xhochy · 2022-10-27T07:43:59Z

Actually, since upstream arrow runs the recipe as part of its CI (c.f. apache/arrow#14102), splitting things up into separate feedstocks (option 2.) does not sound like a good idea anymore. So my preference would now be 3.

From my last years' experience maintaining this feedstock, I would propose to go with option 2. Especially as we test the conda recipes as part of the Arrow CI, we are pretty sure that things work together. We are doing a similar thing with boost(-cpp) and there it also works nicely.

@h-vetinari Can you explain how you came to the above conclusion?

h-vetinari · 2022-10-27T07:54:03Z

@h-vetinari Can you explain how you came to the above conclusion?

What I meant was that the recipe will have to continue being synced back to arrow (assuming it should stay part of the CI there), and if we split the recipe into two feedstocks, I imagine it will be harder to sync that back in a way that upstream CI can do an integrated build & check of both arrow-cpp & pyarrow.

That wouldn't be the case with option 3., because then the recipe would stay contained in one feedstock, and synching back would just mean undoing certain skips (and there's already a few manual adjustment to do anyway, as I learned in apache/arrow#14102)

Is that reasoning understandable? FWIW, I can live very well also with option 2a, if that's what you prefer. Just option 1. is something I'm strongly against, the rest (2.-5.) I can deal with.

xhochy · 2022-10-27T08:45:51Z

What I meant was that the recipe will have to continue being synced back to arrow (assuming it should stay part of the CI there), and if we split the recipe into two feedstocks, I imagine it will be harder to sync that back in a way that upstream CI can do an integrated build & check of both arrow-cpp & pyarrow.

As we had this situation before: With the current manual sync, it didn't feel different whether you had one or two recipes. For the syncing, I would propose to add the changes we need in the recipes to support the builds of nightly versions to the feedstock here. This will mean that we probably need to have some if version == statement in the recipe but should help a lot with the sync of the pinning files. I will think a bit about this and write about a better syncing approach to the Arrow mailing list.

h-vetinari · 2022-10-31T01:58:02Z

If upstream arrow is fine with synchronizing from two feedstocks, I don't mind. One problem that comes to mind (with any of the splits actually) is that we cannot easily do something like {{ pin_subpackage('arrow-cpp', exact=True) }} which we currently do.

If we do push some stuff back to the pyarrow-feedstock, I'd still propose to only build the aarch/ppc builds there, because for all others combinations it actually works beautifully here (and we'd avoid a build matrix of soon-to-be 116¹!! builds there).

Also, keeping the build scripts for pyarrow here up-to-date (due to building for non-aarch/ppc) will make it easier to switch back to doing everything here if/once we can cross-compile cuda.

I will think a bit about this and write about a better syncing approach to the Arrow mailing list.

Could you please let us know here when you've done so?

10 OS+GPU combinations (linux-{64, aarch64, ppc64le}-{cpu, cuda}, win-64-{cpu, cuda}, osx-{64, arm64}) x 6 python versions (CPython 3.8-3.11, PyPy 3.8 & 3.9) x 2 OpenSSL {1.1.1, 3} = 120; minus 4 builds for not having pypy on osx-arm ↩

isuruf · 2022-10-31T17:12:28Z

My understanding is there's currently no way to get a libcuda.so stub library into a conda package or a docker image that we can distribute that is compliant with the CUDA EULA.

We can install libcuda.so stub library from the NVIDIA RPM package at CI run time and not redistribute it.
You'll also need to install the cross-linux-sbsa compiler which is the cross compiler for aarch64.

However, NVIDIA doesn't provide a cross-linux-ppc64le. Not sure why.

h-vetinari · 2022-10-31T19:35:32Z

We can install libcuda.so stub library from the NVIDIA RPM package at CI run time and not redistribute it.

Awesome, thanks a lot! That's what I had pictured, but I thought I'd leave the EULA judgment to those who know Nvidia much better than I do.

Assuming that this is possible, what would be a good way to experiment on this? Iterate in the build scripts here and then move it to the ci-setup once working?

However, NVIDIA doesn't provide a cross-linux-ppc64le. Not sure why.

I think having this even just for aarch would be a big win. Also, we're not yet building PPC+CUDA, so it wouldn't be a "regression" not to publish those.

xhochy · 2022-10-31T19:44:40Z

If we do push some stuff back to the pyarrow-feedstock, I'd still propose to only build the aarch/ppc builds there, because for all others combinations it actually works beautifully here (and we'd avoid a build matrix of soon-to-be 1161!! builds there).

Wouldn't that mean that we need to keep the build scripts for pyarrow in sync between the two repositories?

h-vetinari · 2022-10-31T20:00:14Z

Wouldn't that mean that we need to keep the build scripts for pyarrow in sync between the two repositories?

Yes, but I'm willing to do that. It's far less of a hassle than restarting failing 6h jobs several times, or drip-feeding PRs because we shouldn't be blocking 100+ CI agents at once

h-vetinari · 2022-11-01T23:48:20Z

cross-compile cuda [...] best solution, but infeasible 🥲

For those subscribed: it looks like option 4 is back in play: conda-forge/conda-forge-ci-setup-feedstock#209 🥳

This would be amazing (and it seems it even supports cross-compiling PPC after all). 🤩

h-vetinari · 2022-11-07T06:03:56Z

cross-compilation support for cuda is still baking, but in the meantime I'm continuing this PR in #875

h-vetinari mentioned this pull request Sep 23, 2022

ARROW-17635: [Python][CI] Sync conda recipe with the arrow-cpp feedstock apache/arrow#14102

Merged

h-vetinari changed the title ~~WIP: Debug upstream PPC @ HEAD~~ WIP: arrow 10.0.0 Oct 22, 2022

h-vetinari force-pushed the ppc_head branch from 1185309 to 8ecb3d5 Compare October 22, 2022 03:15

h-vetinari and others added 7 commits October 22, 2022 14:15

add clang/llvm to pyarrow{,-tests} host dependencies

50ad37e

LLVM headers needed by gandiva; clang needed because FindLLVMAlt.cmake depends on it to find LLVM correctly. Co-Authored-By: Sutou Kouhei <kou@clear-code.com>

Remove needless arrow_python checks

62242c0

debug: skip aarch/ppc + CUDA

a29109c

bump to 10.0.0

9270277

sync build scripts with upstream: option reshuffles

a5b3537

sync build scripts with upstream: option additions/removals

8e2fa5e

MNT: Re-rendered with conda-build 3.22.0, conda-smithy 3.21.2, and co…

8c2802a

…nda-forge-pinning 2022.10.21.19.23.23

h-vetinari force-pushed the ppc_head branch from 8ecb3d5 to 8c2802a Compare October 22, 2022 03:23

conda-forge deleted a comment from conda-forge-linter Oct 22, 2022

h-vetinari force-pushed the ppc_head branch from d339fb7 to ff3eac1 Compare October 22, 2022 10:05

debug: materialize symlink

2ce51ad

h-vetinari force-pushed the ppc_head branch from ff3eac1 to 2ce51ad Compare October 22, 2022 10:08

h-vetinari mentioned this pull request Oct 22, 2022

BUG: conda smithy not taking into account (all) python versions for all-in-one-aarch builds conda-forge/conda-smithy#1681

Open

h-vetinari force-pushed the ppc_head branch 2 times, most recently from 84959cf to e434edb Compare October 23, 2022 06:39

remove python dependence from arrow-cpp

ef83fa7

h-vetinari force-pushed the ppc_head branch 3 times, most recently from 84f98d9 to 81a5e9f Compare October 23, 2022 07:36

h-vetinari added 5 commits October 23, 2022 23:30

clean up after each python build

9634e60

MNT: Re-rendered with conda-build 3.22.0, conda-smithy 3.21.2, and co…

d8df3bd

…nda-forge-pinning 2022.10.23.00.15.36

remove spurious patches

3e5a25e

make sure we only build lib for one numpy version

1f17385

MNT: Re-rendered with conda-build 3.22.0, conda-smithy 3.21.2, and co…

1f8fd5e

…nda-forge-pinning 2022.10.23.00.15.36

h-vetinari force-pushed the ppc_head branch from 81a5e9f to 1f8fd5e Compare October 23, 2022 12:31

jakirkham mentioned this pull request Oct 27, 2022

arrow-cpp v10.0.1 #875

Merged

3 tasks

h-vetinari closed this in #875 Dec 4, 2022

h-vetinari deleted the ppc_head branch December 4, 2022 07:19

h-vetinari mentioned this pull request Dec 4, 2022

Reinstate aarch64+CUDA; add ppc64le+CUDA #899

Merged

Uh oh!

WIP: arrow 10.0.0 #866

WIP: arrow 10.0.0 #866

Uh oh!

Conversation

h-vetinari commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conda-forge-linter commented Sep 23, 2022

Uh oh!

h-vetinari commented Sep 23, 2022

Uh oh!

h-vetinari commented Oct 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Oct 23, 2022

Proposal: for aarch/ppc pyarrow builds, revive pyarrow-feedstock or use separate branch here

Uh oh!

jaimergp commented Oct 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Oct 23, 2022

Uh oh!

h-vetinari commented Oct 24, 2022

Uh oh!

h-vetinari commented Oct 24, 2022

Uh oh!

h-vetinari commented Oct 27, 2022

Uh oh!

kkraus14 commented Oct 27, 2022

Uh oh!

h-vetinari commented Oct 27, 2022

Uh oh!

kkraus14 commented Oct 27, 2022

Uh oh!

h-vetinari commented Oct 27, 2022

Uh oh!

xhochy commented Oct 27, 2022

Uh oh!

h-vetinari commented Oct 27, 2022

Uh oh!

xhochy commented Oct 27, 2022

Uh oh!

h-vetinari commented Oct 31, 2022

Footnotes

Uh oh!

isuruf commented Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Oct 31, 2022

Uh oh!

xhochy commented Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

h-vetinari commented Oct 31, 2022

Uh oh!

h-vetinari commented Nov 1, 2022

Uh oh!

h-vetinari commented Nov 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

h-vetinari commented Sep 23, 2022 •

edited

Loading

h-vetinari commented Oct 22, 2022 •

edited

Loading

jaimergp commented Oct 23, 2022 •

edited

Loading

isuruf commented Oct 31, 2022 •

edited

Loading

xhochy commented Oct 31, 2022 •

edited

Loading