Skip to content

Fatal error with read_csv for large file on OS X on both  #954

@user32000

Description

@user32000

I'm attempting to read a 4.5GB csv file with dask. read_csv reproducibly crashes Python, as per below. This seems different to Fatal error when running read_csv #841?

Python 2.7.11 |Continuum Analytics, Inc.| (default, Dec  6 2015, 18:57:58) 
Type "copyright", "credits" or "license" for more information.

IPython 4.0.3 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: pdb
Automatic pdb calling has been turned ON

In [2]: import dask.dataframe as dd

In [3]: from dask.diagnostics import Pro
Profiler     ProgressBar  

In [3]: from dask.diagnostics import ProgressBar

In [4]: df = dd. read_csv('bigcsv_utf_dot_comma.csv')

In [5]: ProgressBar().register()

In [6]: df.compute()
[                                        ] | 0% Completed |  3.4sFatal Python error: GC object already tracked
/Applications/anaconda/envs/python2/bin/python.app: line 3:  5977 Abort trap: 6           /Applications/anaconda/envs/python2/python.app/Contents/MacOS/python "$@"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions