-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Is your feature request related to a problem or challenge?
This issue is to address the remaining tasks from an initial parallel CSV scan PR #6801
The remaining tasks:
- Use
get_opts()for range read on local FS
get_opts()is an interface for range streaming read from ObjectStore (local FS/ cloud storage), currently it's not supported for range read on local FS https://github.com/apache/arrow-rs/blob/0d4e6a727f113f42d58650d2dbecab89b22d4e28/object_store/src/lib.rs#L355
When it's implemented inarrow-rs, we can use it in parallel CSV scan implementation and possibly get some performance improvement (the current implementation will copy the whole CSV file range into memory at once instead of in a streaming fashion) - Use only 1 get operation from ObjectStore for each partition instead of 3 (see original PR discussion)
It's easier to do task 2 after 1 is done (can do tests on the local filesystem)
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers