`read_parquet` graph transfer grows more than lienearly with number of partitions

dask 2.3.0

my calls to to_parquet were never starting - meaning after waiting a long time I still didn't see the tasks appear in the "progress" bar which, to my untrained eye, means that the graph is taking a long time to compute.

I copy pasted the code from `dataframe/core.py to_parquet` to build my own `parts`

https://github.com/dask/dask/blob/master/dask/dataframe/io/parquet/core.py#L458-L481
```
    # write parts
    dwrite = delayed(engine.write_partition)
    parts = [
        dwrite(
            d,
            path,
            fs,
            filename,
            partition_on,
            write_metadata_file,
            fmd=meta,
            index_cols=index_cols,
            **kwargs_pass
        )
        for d, filename in zip(df.to_delayed(), filenames)
    ]

    # single task to complete
    out = delayed(lambda x: None)(parts)
    if write_metadata_file:
        out = delayed(engine.write_metadata)(
            parts, meta, fs, path, append=append, compression=compression
        )
```

I then manually put this in chunks into `client.compute` 

if I do 10 parts, after 27s my tasks appear. If I do 100 parts, after 4.5 minutes (270s) my tasks appear. My dataset has 4000 partitions so, at this rate, it will be 180 minutes before my task graph will even appear and start executing.

For comparison it takes ~2hours to do the full execution on dask 0.2.1.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`read_parquet` graph transfer grows more than lienearly with number of partitions #5321

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_parquet graph transfer grows more than lienearly with number of partitions #5321

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`read_parquet` graph transfer grows more than lienearly with number of partitions #5321