-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Plans for unvendoring package recipes
Unvendoring pyodide-build is almost ready (#4882), and the next step is to unvendor package recipes.
This issue is to discuss the plan for unvendoring package recipes. This is a roadmap that I propose, which is a mixture of my thoughts and @rth's idea in #3827 (comment). Please feel free to comment on this issue if you have any thoughts, concerns, or ideas.
Goals
-
Unvendor package recipes from
pyodide/pyodiderepository, so that building pyodide runtime and building package recipes are separated. -
After the unvendoring, pyodide repository should contain only the core set of packages that are necessary to test the pyodide runtime, which includes:
- unvendored CPython modules
- micropip and its dependencies
- packages used to make the cross-build environment (e.g., numpy, scipy, etc.)
- packages used for testing (e.g., pytest, fpcast-test, shardlib-test, etc.)
-
(optional) Remove test-only packages like fpcast-test and sharedlib-test from the distribution.
Current status
As of 2024/07/06 there are 267 recipes in pyodide repository in total. Building all packages requires more than 2 hours in CI when not cached. Building and testing all packages exhausts the CI resource, and it becomes a bottleneck for the development of pyodide. It is also not scalable to build all packages in the pyodide repository, as the number of packages increases.
Building all packages in the pyodide repository was a good idea when Pyodide was in the early stage of development, where everything was changing rapidly. However, now that Pyodide is changing less frequently, and CPython is more stable, it is time to separate the package recipes from the pyodide repository.
Short-term plan: pyodide-recipes repository
For a short-term plan, I propose to make a new repository called pyodide-recipes that contains all package recipes. We will put recipes and their tests in the pyodide-recipes repository (= the content of the packages directory in the pyodide/pyodide repository).
In pyodide-recipes repository, packages will be built and tested using:
- Latest release of
pyodide-build. - Tip-of-tree cross-build environment
- Tip-of-tree Pyodide runtime (pyodide-core)
Managing the recipes healthy
To keep the recipes healthy, pyodide-recipes repository will work as follows.
- When there is a PR that updates the recipe, packages that are affected by the PR will be built and tested.
- After the PR is merged, all packages will be built and tested in the main branch.
- All packages will be built and tested occasionally by a cron job.
Releasing the recipe tarball
We release a set of recipes as a tarball in the GitHub release of the pyodide-recipes repository.
The release can be manually triggered by the maintainers of the pyodide-recipes repository, by making a new tag in the repository.
We can use the calendar versioning for the tags, e.g., 2024.07.06.
The tarball contains all wheels, unvendored tests, shared libraries, and pyodide-lock.json file, which is basically the output of the pyodide build-recipes * --install command.
Optionally, we can also release the wheels to the Anaconda package index so that users who want to use those packages before the next release of Pyodide can download and install them. However, in this case, only wheels are released, as the Python package index does not support shared libraries.
Current implementation status and migration plan
I have already started to build all the recipes under ryanking13/pyodide-recipes-mirror. It is not that stable yet, but I am confident that our out-of-tree build system is working enough to build all packages.
The migration plan is as follows:
- After the
pyodide-recipes-mirrorrepository is ready, transfer it topyodide/pyodide-recipes. - Remove all recipes from the
pyodide/pyodiderepository, except for the core set of packages that I mentioned above. - Add the URL of the GitHub releases of the
pyodide-recipesrepository to theMakefileof thepyodide/pyodiderepository. - The
pyodide/pyodideCI pipeline changes as follows:
4.1. Build and test the pyodide runtime + core set of packages first.
4.2. After the runtime is built, construct the full Pyodide distribution by downloading the tarball from the URL in the Makefile.
4.3. Run some very simple tests against the full distribution: for instance, import test.
Problems with this short-term plan:
This short-term plan is not perfect, and it has some problems.
Problem 1. Building recipes out-of-tree requires tot pyodide xbuildenv, and updating the Pyodide runtime can/will break recipes.
For instance, if we want to update the Emscripten or CPython versions, we must update the Pyodide runtime first, and then the recipes should follow.
Here is the scenario of updating the Emscripten version:
- Update the Emscripten version in the Pyodide runtime, which breaks dynamic linking.
- The latest release of
pyodide-recipeswill not work with the new Pyodide runtime, so we temporarily disable making a full distribution (it will need to be supported in CI). - After updating the Pyodide runtime and releasing the tot xbuildev, we try to update the recipes in the
pyodide-recipesrepository. - It may take a while to make all recipes work with the new Pyodide runtime, meanwhile we disable some recipes and release the new tarball.
- Update the
pyodide/pyodiderepository to use the new tarball and re-enable the full distribution in the CI pipeline.
So, there will be a period when the full distribution is not available. This is not ideal, but it is the best we can do for now IMO.
Problem 2. It is still not scalable enough
The short-term plan is a step forward from the current situation, but it is not fully scalable.
We will still build and maintain all recipes in a single repository, which will be a bottleneck when the number of recipes increases.
Long-term plan: pyodide-recipes organization
Note: This part is less concrete than the short-term plan, and I am open to any ideas and suggestions.
The long-term plan is to have a separate organization, pyodide-recipes, which will be a community-driven organization (mostly like conda-forge). The pyodide-recipes organization will contain multiple repositories, each responsible for building and testing a single package. Each repository will have its own maintainers.
As each repository is responsible for a single package, it is easier to maintain and scale the package recipes. However, as a tradeoff, Pyodide will no longer fully guarantee the compatibility of the packages in the pyodide-recipes organization. I believe it is a reasonable tradeoff, as it is obvious that Pyodide maintainers cannot maintain all packages in the long term.
How packages are built and tested
As each repository is responsible for a single package, we need to provide a consistent way to build and test the packages.
I propose the following workflow:
- Make an organization in Anaconda.org called
pyodideto which users can upload the built packages (through the CI pipeline). - Each repository in the
pyodide-recipesorganization contains a GitHub action that builds and tests the package. - We provide a way to install the package's dependencies from Anaconda.org when running the tests.
- A combination of
pip install --extra-index-urlandpytest-pyodidemay work, but I don't have a concrete idea yet.
- A combination of
Changes in Pyodide distribution/runtime
In the pyodide/pyodide repository, we will only maintain a list of packages we want to vendor in the Pyodide distribution.
We will add a new feature in pyodide-lock that creates a pyodide-lock.json from scratch, by downloading the package from the package index. So we pass the list of packages that we want to be vendored in the Pyodide distribution to the pyodide-lock command, and it will create a pyodide-lock.json file by downloading the packages from the package index (pyodide organization in Anaconda.org).
Open questions with the long-term plan
-
There are package types, not only Python packages but also libraries, that are not covered by the Python package index. How can we cover this?
- idea: Build and release libraries as a wheel? (e.g. scipy-openblas)
-
How to unvendor tests or other files?
- idea: release wheels without unvendoring tests, and add a new feature in
pyodide-lockthat can unvendor files when creating a pyodide-lock.json (Move test file unvendoring functionality from pyodide-build to pyodide-lock pyodide-lock#30)
- idea: release wheels without unvendoring tests, and add a new feature in
-
Handling dependency resolution. Sometimes, recipes have different dependencies compared to the original package.
- Idea: This isn't a good idea yet. Maybe we can create a metadata file that contains the package's dependency information.
-
We still need to release the pyodide xbuildenv first to make it available in package build. How can we make this process smooth?
- idea1: Similar to the CPython release model, we can make an alpha release of the Pyodide runtime and give the package maintainers some time to update their recipes.
- idea2: separate Pyodide cross-build env release from Pyodide runtime release. Is it possible...?