Skip to content

Getting string[pyarrow] from experimental to production ready #8842

@MrPowers

Description

@MrPowers

Pandas 1.3 includes an experimental string[pyarrow] dtype that's much more memory efficient than object columns.

Rocklin made a video on this new dtype which shows how the string[pyarrow] type greatly reduced memory usage and query times. He shows a better than 2x query speed improvement in his little notebook example with string[pyarrow]. I ran some experimentation on my end and found that the string[pyarrow] type reduced the memory usage of an object column by 6.7x.

These are huge improvements and the progress is really exciting.

The purpose of this issue is to brainstorm the additional work that needs to be performed to get string[pyarrow] out of "experimental" and make it generally available. @mrocklin - can you share your thoughts on the challenges and the next steps?

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussing a topic with no specific actions yetneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions