-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Possibly related to #731. Was trying to make a test case to demonstrate some other behavior.
import dask.dataframe as dd
import dask
dask.set_options(get=dask.async.get_sync)
dsk = {
('x', 0): pd.DataFrame([[1975, 1., 1., 1., 2.],
[1975, 1, 2, 1, 2],
[1975, 1, 3, 1, 2],
[1975, 1, 1, 1, 2],
[1975, 1, 2, 1, 2],
[1975, 2, 1, 1, 3],
[1975, 2, 2, 1, 3],
[1975, 2, 3, 1, 3],
[1975, 2, 1, 1, 3],
[1975, 2, 3, 1, 3],
],
columns=['year', 'id', 'item', 'value', 'denom']),
('x', 1): pd.DataFrame([[1976, 1, 1, 1, 3],
[1976, 1, 2, 1, 3],
[1976, 1, 3, 1, 3],
[1976, 1, 1, 1, 3],
[1976, 1, 2, 1, 3],
[1976, 2, 1, 1, 3],
[1976, 2, 2, 1, 3],
[1976, 2, 3, 1, 3],
[1976, 2, 1, 1, 3],
[1976, 2, 3, 1, 3],
],
columns=['year', 'id', 'item', 'value', 'denom'])
}
df = dd.DataFrame(dsk, 'x', ['year', 'id', 'item', 'value'], divisions=[None,
None,
None])
The output of quantile depends on how you call it.
In [14]: df.year.quantile([0, .5, 1]).compute()
Out[14]:
0.0 1975.0
0.5 1975.0
1.0 1976.0
Name: year, dtype: float64
In [15]: df.year.quantile([1]).compute()
Out[15]:
1 1976.0
Name: year, dtype: float64
In [16]: df.year.quantile([0]).compute()
Out[16]:
0 1975.0
Name: year, dtype: float64
In [17]: df.year.quantile([0.5]).compute()
Out[17]:
0.5 1976.0
Name: year, dtype: float64
Due to the output of the first call, if you try to set the index, you get bad partitions.
In [18]: df.set_index('year').get_division(0).head()
Out[18]:
Empty DataFrame
Columns: [id, item, value, denom]
Index: []
In [19]: df.set_index('year').get_division(1).head()
Out[19]:
id item value denom
year
1976 1.0 1.0 1.0 3.0
1976 1.0 2.0 1.0 3.0
1976 1.0 3.0 1.0 3.0
1976 1.0 1.0 1.0 3.0
1976 1.0 2.0 1.0 3.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels