Skip to content

[Python] Add low-level pyarrow bindings for Acero #33976

@jorisvandenbossche

Description

@jorisvandenbossche

The Acero query engine is one of the few parts of the Arrow C++ libraries that is not directly exposed in the pyarrow bindings. While you can already run queries using Acero through a substrait plan (pyarrow.substrait.run_query; but that requires first building that plan, and limits to what Acero supports of Substrait), and Acero is already used under the hood in some Dataset methods (pyarrow.dataset.Dataset.join/sort_by, but that doesn't currently allow building up a full query), there are some benefits of having direct, low-level bindings:

  • It would be useful for direct users (or downstream packages) of pyarrow that don't want / need to go through substrait (yet another thing to learn / implement)
  • It would make it easier for developers to test new Acero functionality from python
  • You can use features of Acero that are not (yet) available in substrait
  • It could (potentially) give an interface for people developing custom exec nodes
  • It would be useful for people that want to do things "right now" (e.g. 12.0.0) instead of waiting for Substrait (tooling) to mature

The idea would be to add a new pyarrow.acero module, with low level bindings for the Acero execution engine. The goal would not be to build a complete user-friendly query interface (like ibis or dplyr), but provide rather direct access to build up a query with the Declaration interface. Minimum abilities would be to construct a plan, inspect plans created through other means (eg from substrait), and execute a plan (to_table, to_reader, write). An initial goal would be to be able to use this interface ourselves in the places that currently have custom cython usage of the exec plan such as Dataset.join (i.e. refactoring _exec_plan.pyx).

Non-exhaustive list of subtasks:

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions