-
Notifications
You must be signed in to change notification settings - Fork 674
Closed
Labels
Performance 🚀Performance related issues and pull requests.Performance related issues and pull requests.new feature/request 💬Requests and pull requests for new featuresRequests and pull requests for new features
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 19.04
- Modin installed from (source or binary): binary, pip install modin
- Modin version: 0.5.0
- Python version: 3.7.3
- Exact command to reproduce: Use filters in read_parquet
Describe the problem
Probably one of the most important features of parquet is the support for predicate pushdown which helps cut down on the I/O quite significantly. pyarrow supports it, but not in the read_pandas() code. If you replace the existing call to read_pandas() in ray/pandas_on_ray/io.py with the following segment, predicate pushdown automatically works:
df = pq.ParquetDataset(path, **kwargs) \
.read(columns=columns) \
.to_pandas()
# df = pq.read_pandas(path, columns=columns, **kwargs).to_pandas()
# Append the length of the index here to build it externally
I've included the original read_pandas() code commented out to provide an anchor.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Performance 🚀Performance related issues and pull requests.Performance related issues and pull requests.new feature/request 💬Requests and pull requests for new featuresRequests and pull requests for new features