Consistency in collections API for partitions

I have a few points of feedback after working across different dask collections for benchmarks and examples.

Currently, dask collections have a partitioning argument (depending on the particular method being used) and an attribute to show the number of partitions:

| Collection | Specify partitions | Query number of partitions |
| --- | --- | --- |
| dask.array | chunks | numblocks |
| dask.bag | npartitions/chunkbytes/partition_size | npartitions |
| dask.dataframe | npartitions/chunksize | npartitions |

1) When moving between different `from_*` methods in arrays and bags/dataframes, I often pause to remember which argument is used to specify the number of partitions or chunk sizes. Thoughts on making the number of partitions a consistent argument (ideally, `partitions=X` or `npartitions=X`) between all collections? For arrays, I almost always find myself inverting for the chunk sizes to arrive at a particular number of partitions (especially when I'm using the distributed scheduler). It would be comforting to know that I can create a collection and always use a `partitions` or `npartitions` argument whether I'm using `dask.bag.from_filenames` or `dask.array`, any reason this might not be feasible for all collections?

2) Assuming #1 is a valid point, thoughts on making the argument to specify partitions the same as the attribute for querying the number of partitions? For `dask.array`, I need to remember `chunks` vs. `numblocks`.

3) It might be handy to show the number of partitions in the repr for each collection, e.g.:

```
>>> x
dask.array<arange-..., shape=(100,), dtype=int64, chunksize=(2,), partitions=10>

>>> y
dd.Series<from-dask-arraycdefdd873cdf06131bcc8da67a0e65fb, divisions=(0, 2, 4, ..., 98, 99), partitions=10>

>>> b
<dask.bag.core.Bag at 0x11312e0f0, partitions=10>
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistency in collections API for partitions #1008

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Collection	Specify partitions	Query number of partitions
dask.array	chunks	numblocks
dask.bag	npartitions/chunkbytes/partition_size	npartitions
dask.dataframe	npartitions/chunksize	npartitions

Uh oh!

Consistency in collections API for partitions #1008

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions