-
Notifications
You must be signed in to change notification settings - Fork 105
How to best optimize reading from S3? #278
Copy link
Copy link
Closed
Labels
Description
Describe the usage question you have. Please include as many useful details as possible.
Hi!
I have a use case of reading certain row groups from S3.
I see that there is an option BufferedStreamEnabled.
When I set BufferedStreamEnabled to false, it seems to try to read all of the data of a column for a row group at once, which will, unfortunately, result in OOM for us.
When I set BufferedStreamEnabled to true, the library seems to be reading the row group page by page, which is not optimal for cloud usage.
How can I improve this? I imagine that the best way to improve this would be to read multiple pages in one read() sys call?
Component(s)
Parquet
Reactions are currently unavailable