-
Notifications
You must be signed in to change notification settings - Fork 674
Closed
Labels
P0Highest priority tasks requiring immediate fixHighest priority tasks requiring immediate fixdocumentation 📜Updates and issues with the documentationUpdates and issues with the documentationquestion ❓Questions about ModinQuestions about Modin
Description
System information
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
$ conda --version
conda 4.6.14
$ python --version
Python 3.7.3
$ pip --version
pip 19.1 from /home/dlweber/miniconda3/envs/gis-dataprocessing/lib/python3.7/site-packages/pip (python 3.7)
$ pip freeze | grep modin
modin==0.5.0
$ pip freeze | grep pandas
pandas==0.24.2
$ pip freeze | grep numpy
numpy==1.16.3
miniconda3 was used to install most of the sci-py stack, with a pip clause to add modin, e.g.
# environment.yaml
channels:
- conda-forge
- defaults
dependencies:
- python>=3.7
- affine
- configobj
- dask
- numpy
- pandas
- pyarrow
- rasterio
- s3fs
- scikit-learn
- scipy
- shapely
- xarray
- pip
- pip:
- modin
Describe the problem
https://modin.readthedocs.io/en/latest/pandas_supported.html says to_parquet is supported, but maybe not:
import numpy as np
import modin.pandas as pd
size = (1, 10 * 10)
column_ij = ["%04d_%04d" % (i, j) for i in range(10) for j in range(10)]
data = np.random.randint(0, 10000, size=size, dtype="uint16")
df = pd.DataFrame(data, columns=column_ij)
df.to_parquet('/tmp/tmp.parquet')
UserWarning: `DataFrame.to_parquet` defaulting to pandas implementation.
More details:
2019-05-21 16:03:46,207 WARNING worker.py:1337 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2019-05-21 16:03:46,207 INFO node.py:469 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-21_16-03-46_18437/logs.
2019-05-21 16:03:46,310 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:55558 to respond...
2019-05-21 16:03:46,418 INFO services.py:407 -- Waiting for redis server at 127.0.0.1:41726 to respond...
2019-05-21 16:03:46,420 INFO services.py:804 -- Starting Redis shard with 2.1 GB max memory.
2019-05-21 16:03:46,426 INFO node.py:483 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-05-21_16-03-46_18437/logs.
2019-05-21 16:03:46,427 WARNING services.py:1304 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 5238738944 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
2019-05-21 16:03:46,427 INFO services.py:1427 -- Starting the Plasma object store with 6.0 GB memory using /tmp.
UserWarning: Distributing <class 'list'> object. This may take some time.
UserWarning: `DataFrame.to_parquet` defaulting to pandas implementation.
To request implementation, send an email to feature_requests@modin.org.
Maybe modin could be added to conda-forge so that conda can help with resolving version dependencies?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P0Highest priority tasks requiring immediate fixHighest priority tasks requiring immediate fixdocumentation 📜Updates and issues with the documentationUpdates and issues with the documentationquestion ❓Questions about ModinQuestions about Modin