Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
@ryanking13 it looks like we overrun the CI limit in 'build-packages-numpy-dependents' now, because pyarrow is a long build. |
|
Indeed we might need a pyarrow-specific job like we have for opencv. @ryanking13 is hoping to unvendor the packages and move the builds somewhere else eventually so we don't have to deal with these sorts of problems. But in the meantime, PyArrow is pretty popular so it would be nice to support it. |
This is a possible option. But I would prefer to use the way like Even after we unvendor the package, I like to proactively separate and build packages that take a long time to build, if possible. |
|
@ryanking13 There's a minor challenge here which is the patch file - I could obviously set up a separate repo that does the patch and pyodide build, but then there's an extra step for any release (pull arrow to my repo and build, then push the update to pyodide). The patch file didn't go upstream because arrow maintainers don't want to include dependencies on timezone handling stuff. |
|
I've run the test suite on this and it all works by the way. |
Thanks for the effort @joemarshall ! |
Well, we need to maintain the patch file in some place anyway, so I think there isn't much difference. It is worth noting that we support out-of-tree build of recipes, too. So you can copy the recipe, build it from separate repo, deploy the wheel, and use it in this repository. In that case, there will be two kind of recipes, the one you use in the separate repository that actually builds a wheel, and the one you use in this repository that just downloads the built wheel. |
|
The steps to update the PyArrow when the Emscripten or Python version is updated would be:
I agree that this is not ideal, but it is the only feasible way for now. There are some plans for improving this (#4918).
You don't need to track all the updates, but we will ping you when it breaks. |
|
@ryanking13 Is this PR blocked on anything? Would be great to get a version of pyarrow in the next release! |
@cpcloud As mentioned in my previous comment, |
|
@ryanking13 Right, just wondering what next steps are. Are we waiting on the wheel to exist somewhere, or something else? |
Yes, I was waiting for pyarrow folks or anyone who is interested in maintaining the pyarrow wheel to build the wheel out-of-tree and replace the URL in this PR. It's not a hard task; in fact, anyone can do it, as @joemarshall has already done a lot of work on it, and there is already a finished recipe in this PR. But I am not doing it myself because I believe that the pyarrow recipe should be maintained by someone else who is accountable. None of the Pyodide core developers are familiar with pyarrow, so it's not appropriate for us to maintain the package ourselves. |
|
@cpcloud I'm just getting to it. Was held up by kids being on holiday I'm afraid! |
for more information, see https://pre-commit.ci
|
@joemarshall No worries! Amazing work! |
Co-authored-by: Agriya Khetarpal <74401230+agriyakhetarpal@users.noreply.github.com>
agriyakhetarpal
left a comment
There was a problem hiding this comment.
Thanks, @joemarshall! I triggered the CircleCI build-libraries job because it seems to have timed out, but this should be safe to merge in general, since you already mentioned above that the test suite is passing.
|
build-libraries is timing out a lot lately... |
|
It seems to be intermittent. Moreover, it happens at the end of the job, i.e., at the "Persisting to workspace" stage. I have noticed that the libraries get built way earlier (in ~20 minutes), but this stage takes the most time and eventually times out. |
|
Thanks folks! Does this mean that pyarrow will be included as something we can install/import directly from Pyodide in the next Pyodide release? |
|
Yes, @Alex-Monahan. PyArrow will be available using |
|
And also with |
pyarrow package recipe
Also includes pyodide-unix-timezones, so that pyarrow can get correct timezones in time related functions.
The recipe is a bit fiddly - it first builds the arrow lib. It could be built in a separate package, but given it is only for pyarrow, I didn't see the point?
Fixes #2933