I just saw that pandas is considering to hard-depend on pyarrow. This would substantially blow up the footprint of all pandas installs (which is pandas' choice to make), but even more so on conda-forge, because we package more bindings than the wheels1.
This is not the first time that this topic has come up, c.f. below:
@h-vetinari: That said, there's a larger theme here that arrow keeps growing non-trivial dependencies. I guess we could introduce a separate output for a "minimal" arrow (libarrow-core?)
@jorisvandenbossche: That's indeed something we have to look at long term, but probably good for a separate issue? (it seems to me that regardless of that, cudatoolkit should never be a dependency for the CPU version?)
libarrow itself actually already exists of multiple shared libraries (libarrow, libarrow_dataset, libarrow_flight, etc). So that could be a first way to split it into multiple packages that should be simpler (eg libarrow_flight has some extra dependencies that are not needed for libarrow itself, such as UCX). On the Arrow C++ side itself, we are also working on further splitting the core libarrow in more shared libraries.
Originally posted by @jorisvandenbossche in #962 (comment)
I just saw that pandas is considering to hard-depend on pyarrow. This would substantially blow up the footprint of all pandas installs (which is pandas' choice to make), but even more so on conda-forge, because we package more bindings than the wheels1.
This is not the first time that this topic has come up, c.f. below:
Originally posted by @jorisvandenbossche in #962 (comment)
Footnotes
given the difficulty of building arrow, they are effectively uninstallable otherwise, hence the maximalist approach to packaging on this feedstock, resp. in conda-forge in general ↩