Skip to content

Package some form of "minimal" libarrow? #1035

@h-vetinari

Description

@h-vetinari

I just saw that pandas is considering to hard-depend on pyarrow. This would substantially blow up the footprint of all pandas installs (which is pandas' choice to make), but even more so on conda-forge, because we package more bindings than the wheels1.

This is not the first time that this topic has come up, c.f. below:

@h-vetinari: That said, there's a larger theme here that arrow keeps growing non-trivial dependencies. I guess we could introduce a separate output for a "minimal" arrow (libarrow-core?)

@jorisvandenbossche: That's indeed something we have to look at long term, but probably good for a separate issue? (it seems to me that regardless of that, cudatoolkit should never be a dependency for the CPU version?)
libarrow itself actually already exists of multiple shared libraries (libarrow, libarrow_dataset, libarrow_flight, etc). So that could be a first way to split it into multiple packages that should be simpler (eg libarrow_flight has some extra dependencies that are not needed for libarrow itself, such as UCX). On the Arrow C++ side itself, we are also working on further splitting the core libarrow in more shared libraries.

Originally posted by @jorisvandenbossche in #962 (comment)

Footnotes

  1. given the difficulty of building arrow, they are effectively uninstallable otherwise, hence the maximalist approach to packaging on this feedstock, resp. in conda-forge in general

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions