-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Description
When using dask.dataframe.DataFrame.to_csv, I noticed that lines end with \r\r\n, as opposed to simply \r\n. I'm guessing this is a Windows issue, although I haven't had the opportunity to test on another system.
The following example shows the bytes produced when using pandas and dask to write the same data to csv. The pandas output is what I was expecting for both. Hopefully this makes it clear what's going on.
Code
import sys
import dask, dask.dataframe as dd
import pandas as pd
df = pd.DataFrame({'a': [0]})
ddf = dd.from_pandas(df, npartitions=1)
df.to_csv('temp-pandas.csv')
pandas_bytes = open('temp-pandas.csv', 'rb').read()
ddf.to_csv('temp-dask_*.csv')
dask_bytes = open('temp-dask_0.csv', 'rb').read()
print('pandas result:', pandas_bytes)
print('dask result :', dask_bytes)
print('---')
print('dask version:', dask.__version__)
print('Python version:', sys.version)Output
pandas result: b',a\r\n0,0\r\n'
dask result : b',a\r\r\n0,0\r\r\n'
---
dask version: 1.2.2
Python version: 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
Thanks for the work on an awesome package!
Metadata
Metadata
Assignees
Labels
No labels