python: Use meson-python instead of setuptools#644
Conversation
python/src/nanoarrow/meson.build
Outdated
| sources: [cyf], | ||
| cython_args: cython_args, | ||
| include_directories: nanoarrow_src_dir, | ||
| dependencies: [ |
There was a problem hiding this comment.
This is a little too broad to use every dependency for every Cython file; can refine so that only IPC files need the IPC dep, Device files need the device dep, etc...
7245021 to
7c0e44a
Compare
| nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | ||
| link_with: nanoarrow_ipc_lib, | ||
| dependencies: [nanoarrow_dep, flatcc_dep]) | ||
| dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
I think it was a mistake to include the flatcc_dep transitively here; I think its only required to build the nanoarrow_ipc lib itself, but shouldn't be required by libraries that want to use nanoarrow_ipc (?)
There was a problem hiding this comment.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/* files in the include path.
There was a problem hiding this comment.
Yea this removal here essentially removes the flatcc headers from being included by downstream targets that use the nanoarrow_ipc_dep; this is akin to changing the flatcc target inclusion from PUBLIC to PRIVATE in CMake
|
I know you're still working here, but just making sure you know about https://github.com/apache/arrow-nanoarrow/blob/main/ci/scripts/test-python-wheels.sh Will this work with CUDA? I imagine it's possible to add I do think we have to support building from sdist since I believe that's how the conda-forge setup currently works. (In general I think meson is way better than setuptools provided it has the same level of Python version and platform support...thank you for working on this!) |
| # Replaced by version-bumping scripts at release time | ||
| version = "0.6.0.dev0" | ||
| [wrap-file] | ||
| directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
This is just pointing to a single commit here that includes some of the changes made to the top level meson.build configuration. In a follow up ideally point to main or even a wrapdb release
There was a problem hiding this comment.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
...and maybe if building a source distribution you could copy arrow-nanoarrow/src into python/vendor/arrow-nanoarrow/arc and update the subprojects file?
There was a problem hiding this comment.
The commit level integration is managed through the symlink to the subprojects folder. A wrap file doesn't work like that - instead it just copies the library at a point in time to subprojects, and you would have to manually run meson wrap update constantly to get that
There was a problem hiding this comment.
To try and bring that all together...I wanted to have a subprojects folder that looked like:
python/
subprojects/
arrow-nanoarrow/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
Ideally, if meson finds the arrow-nanoarrow subproject it should not need the wrap file at all, and just happily use the symlink for a live integration. When it comes to sdists where the symlink will not exist, the wrap file should be responsible for resolving and unpacking a particular arrow-nanoarrow distribution to the subprojects folder (probably from the wrap db, but in this PR I have it pointing to this branch to get the config updates)
Due to the "bug" I linked elsewhere in this PR, Meson confusingly still downloads the contents of arrow-nanoarrow.wrap even when the arrow-nanoarrow symlink exists, so to work around that the local directory structure instead looks like:
python/
subprojects/
arrow-nanoarrow-dev/ -- this is a symlink to the project root
arrow-nanoarrow.wrap
And the configuration looks like:
nanoarrow_proj = subproject('arrow-nanoarrow-dev', required: false)
if not nanoarrow_proj.found()
nanoarrow_proj = subproject('arrow-nanoarrow')
endifWhen developing locally, the arrow-nanoarrow-dev symlink will always be prioritized. Of course, when we create source distributions we end up removing symlinks, so the wrap file comes into play in those instances.
Hope that makes sense but let me know if anything is unclear
There was a problem hiding this comment.
I think I get it, but I would like the sdist to have a full copy of the nanoarrow C library rather than download something. I can experiment a bit to see if I can get that to work (as well as fire up my Windows and CUDA test machines locally to ensure those both work too).
There was a problem hiding this comment.
Ah OK. In that case we can probably get rid of the wrap file then; just need to figure out the right way/hook to install a copy into subprojects when generating the sdist
There was a problem hiding this comment.
Maybe in bootstrap.py? ADBC sets an environment variable when building a source distribution and we probably could too:
There was a problem hiding this comment.
Meson-python and meson both actually have a dist option to include subprojects, which I think is where our best bet is
https://meson-python.readthedocs.io/en/stable/how-to-guides/meson-args.html
Of course, it doesn't seem to work right now (I think the symlink behavior tricks it into failure) but let me research that a bit more. Would rather let the build system take care of this if it can
There was a problem hiding this comment.
meson.add_dist_script may be useful here? It will run when building an sdist, and you can invoke a Python script to copy exactly the files/dirs you want under subprojects/.
There was a problem hiding this comment.
Ah OK I think that's a great idea. The --include-subprojects flag I was looking at seems to just copy the symlink, but not deeply copy the subproject. So a custom script is probably the safest bet
0943ede to
c0d3bc5
Compare
Definitely not an area of expertise for me, but I don't think Meson provides that many abstractions over CUDA at the moment
Yep should still be a part of this - when cloning the git repo if the arrow-nanoarrow subproject is there, meson should reference that. For source distributions, they should include the |
c0d3bc5 to
30914ec
Compare
.github/workflows/python-wheels.yaml
Outdated
| if: ${{ matrix.config.label == "windows" }} | ||
| with: | ||
| python-version: "3.12" | ||
| architecture: x64 |
There was a problem hiding this comment.
Seems like setup-python still installs a 32 bit Python by default on the windows runner, while cibuildwheel is expected 64 bit on that host
There was a problem hiding this comment.
Do we still get 32-bit wheels?
There was a problem hiding this comment.
I believe so...I think cibuildwheel controls that, and since we don't override anything in the pyproject.toml I think it should still generate the same types of wheels as specified here:
https://cibuildwheel.pypa.io/en/stable/options/#build-skip
Admittedly my knowledge of Windows and how it builds for different architectures is limited, so definitely something to verify when wheels actually get build
There was a problem hiding this comment.
You might have to update the python-wheels.yaml workflow to fire on python/meson.build instead of python/setup.py to get it to run on this PR (in which case we can inspect the results and find out!)
|
Putting back in draft mode for now as there might be an upstream bug in meson we need to resolve mesonbuild/meson#13746 ...or my understanding is wrong. Either way, I think the way things stand now are a very close approximation of what this solution could look like |
|
Another outlet of this (or something in addition to this) could be something in |
paleolimbot
left a comment
There was a problem hiding this comment.
Just a few thoughts! If we can solve the commit-level integration issue I think we should merge just after the release 🙂
| # Replaced by version-bumping scripts at release time | ||
| version = "0.6.0.dev0" | ||
| [wrap-file] | ||
| directory = arrow-nanoarrow-b78c0395a6fe74e776daf07933583f06567fb99c |
There was a problem hiding this comment.
Maintaining the commit-level integration is the real trick of this particular Python package. It is probably difficult, but solving that is essential if this going to be the approach we use (i.e., if we are going to update the Python build system, we can't do it in such a way that it removes features or increases complexity).
Does the approach you use in the examples work?
arrow-nanoarrow/examples/meson-minimal/subprojects/nanoarrow.wrap
Lines 18 to 24 in 4168119
...and maybe if building a source distribution you could copy arrow-nanoarrow/src into python/vendor/arrow-nanoarrow/arc and update the subprojects file?
.github/workflows/python-wheels.yaml
Outdated
| if: ${{ matrix.config.label == "windows" }} | ||
| with: | ||
| python-version: "3.12" | ||
| architecture: x64 |
There was a problem hiding this comment.
Do we still get 32-bit wheels?
|
I don't really have a stake in this, but just to mention it: for pyarrow we are considering to switch from setuptools to scikit-build-core, and for maintenance overhead there might be some value in using the same build backend for the various python packages in the project (but in the end, we should also use the best tool in each case, and here there is less existing cmake compared to pyarrow, so using meson-python might be the simpler option) |
|
Given the dependency-freeness, probably either would work, and probably it would not be too hard to transition to one or the other if we need to for some reason! (We definitely need something better than setuptools, though, since the day is coming soon when we need zstd and lz4 to support IPC compression). |
|
I spent some time working with this to see if there was any easy way out of the "meson python doesn't make it easy to build a Python package in a subdirectory" problem, and I wonder if a slightly lower barrier to entry would be to keep the existing |
|
Yea I think that or what @rgommers suggested (run a script during Meson's dist hook to vendor what we need) are the two options. I'll take another look at this next week |
9bae7f3 to
ae750f0
Compare
python/generate_dist.py
Outdated
| target_src_dir = subproj_dir / "src" | ||
| shutil.copytree(src_dir / "src", target_src_dir) | ||
|
|
||
| # this files are only needed by bootstrap.py |
There was a problem hiding this comment.
The amount of scripting here is unfortunate...would still be nice to replace bootstrap.py and/or bundle.py, but not sure I want to tackle that in this PR
There was a problem hiding this comment.
For this PR I think you will have the most success just renaming bootstrap.py to generate_dist.py and including a meson lib definition for nanoarrow in python/meson.build (i.e., don't try to build nanoarrow C using a subproject).
There was a problem hiding this comment.
What do you mean by "including a meson lib definition?"
There was a problem hiding this comment.
Running bootstrap.py will get you python/vendor/nanoarrow.c and friends, and you should be able to put the appropriate nanoarrow_c_lib = library(...) in python/meson.build?
There was a problem hiding this comment.
Ah OK - so you want to run bootstrap.py continuously, not just when the distribution is being created?
The current form uses the subproject symlink for live integration with the C project and only vendors when the distribution is created
There was a problem hiding this comment.
If you can get all the tests passing and all the wheels building with that go for it! I think you will have more success tackling those two battles separately but the time is yours 🙂
There was a problem hiding this comment.
OK I understand your point of view looking at this some more. The bootstrapping mechanism is pretty tightly engrained as is, and would be an effort in and of itself to replace.
Still looking at the right balance here - might take me a bit
2b35a72 to
254202c
Compare
| py_sources = [ | ||
| '__init__.py', | ||
| 'array.py', | ||
| 'array_stream.py', | ||
| 'c_array.py', | ||
| 'c_array_stream.py', | ||
| 'c_buffer.py', | ||
| 'c_schema.py', | ||
| 'device.py', | ||
| 'ipc.py', | ||
| 'iterator.py', | ||
| '_repr_utils.py', | ||
| 'schema.py', | ||
| 'visitor.py', | ||
| ] |
There was a problem hiding this comment.
Is there any way to make this a glob like "*.py" so that a new contributor doesn't have to remember to update this list?
There was a problem hiding this comment.
I don't think so. Here's some documentation in Meson as to why:
https://mesonbuild.com/FAQ.html#why-cant-i-specify-target-files-with-a-wildcard
python/generate_dist.py
Outdated
| target_src_dir = subproj_dir / "src" | ||
| shutil.copytree(src_dir / "src", target_src_dir) | ||
|
|
||
| # this files are only needed by bootstrap.py |
There was a problem hiding this comment.
For this PR I think you will have the most success just renaming bootstrap.py to generate_dist.py and including a meson lib definition for nanoarrow in python/meson.build (i.e., don't try to build nanoarrow C using a subproject).
python/src/nanoarrow/__init__.py
Outdated
| from nanoarrow.visitor import nulls_as_sentinel, nulls_forbid, nulls_separate | ||
| from nanoarrow._version import __version__ # noqa: F401 | ||
|
|
||
| __version__ = importlib.metadata.version("nanoarrow") |
There was a problem hiding this comment.
@jorisvandenbossche is there any issues with versioning in this way? Is there any cost to importing importlib.metadata by default?
(I am only skeptical because I haven't seen a Python package do this before)
There was a problem hiding this comment.
I think I accidentally removed the comment, but the downside to this approach versus a dynamic version generator like miniver is that the git hash is not included as part of the project metadata.
In this case the project definition from pyproject.toml just imports whatever the build-backend provides, which is currently 0.7.0-SNAPSHOT
There was a problem hiding this comment.
I think it'll have to be 0.7.0.dev[0-9]+ to work with the nightly wheels!
There was a problem hiding this comment.
As far as whether or not this approach is viable, here is the Python packaging documentation:
https://packaging.python.org/en/latest/discussions/single-source-version/#single-source-version
The date on that is pretty recent and I believe importlib.metadata was provisional up until 3.10, so a lot of established packages probably had to come up with their own solution
There was a problem hiding this comment.
is there any issues with versioning in this way? Is there any cost to importing
importlib.metadataby default?
I would personally advise to not use importlib.metadata.version for creating the __version__ attribute, because it will add some avoidable overhead (without testing I don't know how big this would be, but in general we want to keep the import as lean as possible, and this seems unnecessary overhead to scan installed packages, when the __version__ attribute can (in the built packages, i.e. injected during build-time) simply be a static string).
In addition, when using editable installs, this might also not always work as expected (the metadata version does not update automatically).
As far as whether or not this approach is viable, here is the Python packaging documentation:
https://packaging.python.org/en/latest/discussions/single-source-version/#single-source-version
Note that that page does not actually mention to use importlib.metadata.version to populate __version__, just that there is a desire that those return the same thing.
There has been a long discussion on this the last months at https://discuss.python.org/t/please-make-package-version-go-away/58501, that resulted in the updated page linked above (it's long, don't read it all ;), but I do see that it does actually have some discussion on the overhead, estimated there to be 20-40ms (although it might also depend on how big your environment is))
python/src/nanoarrow/meson.build
Outdated
| # Try to resolve the symlink to a subproject first and fall back | ||
| # to the wrap entry if that is unsuccessful. Ideally Meson would | ||
| # take care of this for us, but there is a bug upstream | ||
| # https://github.com/mesonbuild/meson/issues/13746#issuecomment-2392510954 | ||
| nanoarrow_proj = subproject('arrow-nanoarrow') | ||
| nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep') | ||
| nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep') | ||
| nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep') |
There was a problem hiding this comment.
Again, I think you want to vendor the files using bootstrap.py or whatever facility meson provides for that and not attempt to get the nanoarrow-as-a-subproject bit working (orthogonal battle to using meson-python vs. setuptools).
| [build-system] | ||
| requires = [ | ||
| "setuptools >= 61.0.0", | ||
| "meson-python", |
There was a problem hiding this comment.
We probably need a version constraint here?
| nanoarrow_ipc_dep = declare_dependency(include_directories: [incdir], | ||
| link_with: nanoarrow_ipc_lib, | ||
| dependencies: [nanoarrow_dep, flatcc_dep]) | ||
| dependencies: [nanoarrow_dep]) |
There was a problem hiding this comment.
I'm unclear on the difference but yes, the flatcc headers are deliberately not included in nanoarrow's public headers such that a library using nanoarrow does not need flatcc/* files in the include path.
e7c117b to
c88e104
Compare
b90d55c to
75298de
Compare
|
@rgommers maybe you have an idea on the Windows failure? Tried looking through upstream discussions but its a rather broad topic: https://github.com/apache/arrow-nanoarrow/actions/runs/11559159131/job/32173258300?pr=644#step:6:184 AFAIU the container which cibuildwheel installs a 32 bit Python but the compiler being used is 64 bit? |
Yes indeed. The tl;dr is that if you want to build wheels for 32-bit Python on 64-bit Windows, Meson doesn't automatically configure MSVC in 32-bit mode for you (rationale: there could be non-Python library or executable targets, and why would those be 32-bit because somewhere else in To fix it, explicitly configure 32-bit MSVC in your |
61d4deb to
b59fce4
Compare
b59fce4 to
1c50f70
Compare
16f8652 to
0fcf0c3
Compare
ffd415b to
b13064e
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Truly heroic effort getting all of this to pass!
The only issue I see is that flatcc sources aren't vendored/not contained in the sdist. I think that is OK for now (solvable if somebody asks for it) and probably only directly affects somebody trying to do Python development on an airplane 🙂 .
The latest commit should add that functionality now, relying on Meson's own functionality to bundle subprojects as part of dist creation. It's a bit wonky that flatcc is bundled that way and the rest of the files are created by our custom script reading from a CMake file...but I'm not sure that can be resolved in this PR. Definitely would like to tackle that in follow up(s) |
4ccf15e to
d3a7490
Compare
|
Thanks @paleolimbot for the review and @rgommers for all of the guidance! |
This PR originally started as a nanobind POC to see if we could convert from Cython over to that, but I scaled that back just to focus on the possiblity of porting the build system to meson (you can see some nanobind components in the history, if you cared to look)
The build system port may have some merits on its own. It is more tightly integrated with the build system of the C code, and has the advantage of being built in parallel.
Technically it links to the main project by symlinking it as a subproject in thesubprojectsdirectory; if we cared to support source Python builds we could replace that symlink with a wrap file on distribution, or could even detect within the configuration if the symlink resolves. In the case it does not, we can fallback to a github releaseIf the user wanted to develop locally, they could simply symlink the project root into the subprojects folder of Python (the git repo already does this). For source distributions, the wrap system can provide a fallback project to download and reference
If we choose to try and move off of Cython incrementally and use something like nanobind or even C extensions, I think Meson can handle those more capably than the setup.py script