-
Notifications
You must be signed in to change notification settings - Fork 674
Closed
Milestone
Description
Split from #626
The read_parquet is not supported for a partitioned data set.
System information
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
$ conda --version
conda 4.6.14
$ python --version
Python 3.7.3
$ pip --version
pip 19.1 from /home/dlweber/miniconda3/envs/gis-dataprocessing/lib/python3.7/site-packages/pip (python 3.7)
$ pip freeze | grep modin
modin==0.5.0
$ pip freeze | grep pandas
pandas==0.24.2
$ pip freeze | grep numpy
numpy==1.16.3
miniconda3 was used to install most of the sci-py stack, with a pip clause to add modin, e.g.
# environment.yaml
channels:
- conda-forge
- defaults
dependencies:
- python>=3.7
- affine
- configobj
- dask
- numpy
- pandas
- pyarrow
- rasterio
- s3fs
- scikit-learn
- scipy
- shapely
- xarray
- pip
- pip:
- modin
Describe the problem
https://modin.readthedocs.io/en/latest/pandas_supported.html says read_parquet is supported, but maybe not for partitioned data.
error
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/backends/pandas/query_compiler.py", line 871, in _full_reduce
mapped_parts = self.data.map_across_blocks(map_func)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 209, in map_across_blocks
preprocessed_map_func = self.preprocess_func(map_func)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/base/frame/partition_manager.py", line 100, in preprocess_func
return self._partition_class.preprocess_func(map_func)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/modin/engines/ray/pandas_on_ray/frame/partition.py", line 108, in preprocess_func
return ray.put(func)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 2216, in put
worker.put_object(object_id, value)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 375, in put_object
self.store_and_register(object_id, value)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/worker.py", line 309, in store_and_register
self.task_driver_id))
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/utils.py", line 475, in _wrapper
return orig_attr(*args, **kwargs)
File "pyarrow/_plasma.pyx", line 496, in pyarrow._plasma.PlasmaClient.put
File "pyarrow/serialization.pxi", line 355, in pyarrow.lib.serialize
File "pyarrow/serialization.pxi", line 150, in pyarrow.lib.SerializationContext._serialize_callback
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle.py", line 952, in dumps
cp.dump(obj)
File "/home/joe/miniconda3/envs/project/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle.py", line 271, in dump
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels