Dask parquet metadata w/ ~2k files very slow

## To reproduce:

* Given an S3 directory of ~2k parquet files (~20mb) without a metadata summary file. In our case written by Spark.
* Read it 
```python
dask_df = dataframe.read_parquet("s3://path/*parquet")
```

## Result
It takes several minutes to `read_metadata`.

## Suggested fix
`read_parquet()` documentation for gather_statistics:
>     gather_statistics : bool or None (default).
>         Gather the statistics for each dataset partition. By default,
>         this will only be done if the _metadata file is available. Otherwise,
>         statistics will only be gathered if True, because the footer of
>         every file will be parsed (which is very slow on some systems).

Arrow and FastParquet engines read each file even when `gather_statistics is None` and the metadata file doesn't exist.  

Proposed fix is to change occurrences of `gather_statistics is not False` to `gather_statistics  is True` to follow the intent: [search of occurrences](https://github.com/dask/dask/search?q=%22gather_statistics+is+not+False%22&unscoped_q=%22gather_statistics+is+not+False%22)  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask parquet metadata w/ ~2k files very slow #5272

To reproduce:

Result

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dask parquet metadata w/ ~2k files very slow #5272

Description

To reproduce:

Result

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions