-
-
Notifications
You must be signed in to change notification settings - Fork 64
WIP: arrow 10.0.0 #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: arrow 10.0.0 #866
Conversation
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
@conda-forge-admin, please rerender |
LLVM headers needed by gandiva; clang needed because FindLLVMAlt.cmake depends on it to find LLVM correctly. Co-Authored-By: Sutou Kouhei <kou@clear-code.com>
…nda-forge-pinning 2022.10.21.19.23.23
|
Looks like the PPC failures are gone! 🥳 Windows has some issues with a symlink, but 2ce51ad can be removed as soon as there's a release dist. |
84959cf to
e434edb
Compare
84f98d9 to
81a5e9f
Compare
…nda-forge-pinning 2022.10.23.00.15.36
…nda-forge-pinning 2022.10.23.00.15.36
|
@conda-forge/arrow-cpp @jakirkham @jaimergp @isuruf @kkraus14 TL;DR: Due to recent changes, we need to do something about emulation builds here Proposal: for aarch/ppc pyarrow builds, revive pyarrow-feedstock or use separate branch hereFirst off: Amazing news, arrow-cpp is not depending on python anymore 🥳
The aarch64/ppc64le builds are already timing out about 50% in emulation, so adding more outputs to the same job is completely impossible. I can imagine four possible ways to fix this AFAICT (sorted by preference, ascending):
Obviously 4. would be nicest, but because I have no idea how long that's going to take to become possible, I'd like to proceed with 3. or 2. PS. If someone has a better way than 1f17385 to keep conda smithy from generating jobs for different numpy versions, I'm all ears! |
|
Hm, and what about some |
That would be a similar situation as until now - 16 jobs (once we include PPC+CUDA) for aarch/ppc, building That means it would take 5-6(!) restarts of 6h jobs to eventually get a passing run, which is just a spectacular hassle that I'd like to avoid going forward. |
Actually, now that OpenSSL 3 has been unblocked, we'd have 32 jobs only for aarch/ppc (and that's not even counting another 16 if we get PyPy); as the person doing most of the maintenance here at the moment, that makes me object strongly to "1." above - even though I try to do the restarts, sometimes the GH-Azure interaction will not let me do it anymore after 3-4, and since we'd need 6-8 restarts to realistically finish all 32-48 timeout-prone jobs, this would mean missing aarch/ppc pyarrow builds for every PR (and attendant resolution problems). Comparatively, it's way less hassle to do option 2. or 3. |
That would be a similar situation as until now - 16 jobs (once we include PPC+CUDA) for aarch/ppc, building That means it would take 5-6(!) restarts of 6h jobs to eventually get a passing run, which is just a spectacular hassle that I'd like to avoid going forward.
Actually, since upstream arrow runs the recipe as part of its CI (c.f. apache/arrow#14102), splitting things up into separate feedstocks (option 2.) does not sound like a good idea anymore. So my preference would now be 3. |
|
@jakirkham @kkraus14 @isuruf @jaimergp |
My understanding is there's currently no way to get a |
|
Thanks for the quick response @kkraus14! CC @conda-forge/core: with upstream changes (plus openssl & pypy in the pipeline), and the inability to cross-compile for cuda, we need to split the builds here. More details further up. Recapping our options, and adding two more:
To detail option 3b, I'm envisioning a setup that doesn't need rebases, as follows:
|
|
I'd propose a 5th option here:
|
Sure, if people are fine with not producing CUDA builds on aarch/ppc, that's even easier. I had just assumed that it would not be acceptable to remove a feature like that (which people spent a bunch of time on, from looking at the old PRs that enabled it). |
From my last years' experience maintaining this feedstock, I would propose to go with option 2. Especially as we test the conda recipes as part of the Arrow CI, we are pretty sure that things work together. We are doing a similar thing with @h-vetinari Can you explain how you came to the above conclusion? |
What I meant was that the recipe will have to continue being synced back to arrow (assuming it should stay part of the CI there), and if we split the recipe into two feedstocks, I imagine it will be harder to sync that back in a way that upstream CI can do an integrated build & check of both arrow-cpp & pyarrow. That wouldn't be the case with option 3., because then the recipe would stay contained in one feedstock, and synching back would just mean undoing certain skips (and there's already a few manual adjustment to do anyway, as I learned in apache/arrow#14102) Is that reasoning understandable? FWIW, I can live very well also with option 2a, if that's what you prefer. Just option 1. is something I'm strongly against, the rest (2.-5.) I can deal with. |
As we had this situation before: With the current manual sync, it didn't feel different whether you had one or two recipes. For the syncing, I would propose to add the changes we need in the recipes to support the builds of nightly versions to the feedstock here. This will mean that we probably need to have some |
|
If upstream arrow is fine with synchronizing from two feedstocks, I don't mind. One problem that comes to mind (with any of the splits actually) is that we cannot easily do something like If we do push some stuff back to the pyarrow-feedstock, I'd still propose to only build the aarch/ppc builds there, because for all others combinations it actually works beautifully here (and we'd avoid a build matrix of soon-to-be 1161!! builds there). Also, keeping the build scripts for pyarrow here up-to-date (due to building for non-aarch/ppc) will make it easier to switch back to doing everything here if/once we can cross-compile cuda.
Could you please let us know here when you've done so? Footnotes
|
We can install libcuda.so stub library from the NVIDIA RPM package at CI run time and not redistribute it. However, NVIDIA doesn't provide a |
Awesome, thanks a lot! That's what I had pictured, but I thought I'd leave the EULA judgment to those who know Nvidia much better than I do. Assuming that this is possible, what would be a good way to experiment on this? Iterate in the build scripts here and then move it to the ci-setup once working?
I think having this even just for aarch would be a big win. Also, we're not yet building PPC+CUDA, so it wouldn't be a "regression" not to publish those. |
Wouldn't that mean that we need to keep the build scripts for pyarrow in sync between the two repositories? |
Yes, but I'm willing to do that. It's far less of a hassle than restarting failing 6h jobs several times, or drip-feeding PRs because we shouldn't be blocking 100+ CI agents at once |
For those subscribed: it looks like option 4 is back in play: conda-forge/conda-forge-ci-setup-feedstock#209 🥳 This would be amazing (and it seems it even supports cross-compiling PPC after all). 🤩 |
|
cross-compilation support for cuda is still baking, but in the meantime I'm continuing this PR in #875 |
PPC builds in apache/arrow#14102 are failing.
Edit: new version dropped, repurposing this PR as it had almost all the required bits already there.