Skip to content

[Python] avoid using pandas internals #35081

@jbrockmendel

Description

@jbrockmendel

Describe the bug, including details regarding any error messages, version, and platform.

ATM pyarrow passes a BlockManager to the pd.DataFrame constructor, but in doing so accesses pandas functions/classes that are not public and that some pandas maintainers (me) would like to ween arrow off of (xref pandas-dev/pandas#52419).

@jorisvandenbossche tells me the current usage is performance motivated, particularly (but not exclusively?) the performance hit associated with pandas silently consolidating, which it no longer does in 2.0.

Let's see if we can find an alternative using pandas' public API. Starting bid: is pd.DataFrame.from_arrays a viable alternative?

Component(s)

Python

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions