Errors in quantile calculation

I generated simple data: numbers 1-100, split into 10 shards, so that the first shard contains numbers 1-10, the second 11-20, and so on. I read the data with
`df = dd.read_csv(os.path.join('/home/grzegorz/shards', 'shard_*.csv'))`
and then I try to summarize it:
`df.describe().compute()`
The result is:
```
                a
count  100.000000
mean    50.500000
std     29.011492
min      1.000000
25%     33.250000
50%     65.500000
75%     97.750000
max    100.000000
```
Clearly the quantiles are far from the right value. I understand that calculation of quantiles in a distributed data set is very troublesome and obtaining the exact value might be even impossible, but it seems like a relatively big error for such a simple data set. It puts in question whether the results are reliable for instance for data with trend.

Please, let know if quantile computation is going to be improved. 

Python 3.5.2
dask 0.16.0



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Errors in quantile calculation #3115

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Errors in quantile calculation #3115

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions