-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Pandas 1.3 includes an experimental string[pyarrow] dtype that's much more memory efficient than object columns.
Rocklin made a video on this new dtype which shows how the string[pyarrow] type greatly reduced memory usage and query times. He shows a better than 2x query speed improvement in his little notebook example with string[pyarrow]. I ran some experimentation on my end and found that the string[pyarrow] type reduced the memory usage of an object column by 6.7x.
These are huge improvements and the progress is really exciting.
The purpose of this issue is to brainstorm the additional work that needs to be performed to get string[pyarrow] out of "experimental" and make it generally available. @mrocklin - can you share your thoughts on the challenges and the next steps?