-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Python] avoid using pandas internals #35081
Copy link
Copy link
Closed
Labels
Component: PythonPriority: BlockerMarks a blocker for the releaseMarks a blocker for the releaseType: bug
Milestone
Description
- [Python] pandas internals: avoid using DatetimeTZBlock #38341
- [Python] pandas internals: Passing a manager object to DataFrame() constructor is deprecated #38342
Describe the bug, including details regarding any error messages, version, and platform.
ATM pyarrow passes a BlockManager to the pd.DataFrame constructor, but in doing so accesses pandas functions/classes that are not public and that some pandas maintainers (me) would like to ween arrow off of (xref pandas-dev/pandas#52419).
@jorisvandenbossche tells me the current usage is performance motivated, particularly (but not exclusively?) the performance hit associated with pandas silently consolidating, which it no longer does in 2.0.
Let's see if we can find an alternative using pandas' public API. Starting bid: is pd.DataFrame.from_arrays a viable alternative?
Component(s)
Python
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Component: PythonPriority: BlockerMarks a blocker for the releaseMarks a blocker for the releaseType: bug