Dask dataframe - divisions don't seem to be consistent

Hello, 

I wanted to point out that the definition of a "division" is not consistent between the different input formats. 

divisions appear to be a tuple which is either
- Style A
  (first index in first division, last index is first division, last index in second division, ... , last index in n-1 division, last index in n division)
- Style B
  (first index in first division, first index in second division, ... , first index in n division, last index in n division)
- Style C
  All divisions are labeled as None. 

I think this is how this is currently set-up...
- read_csv uses Style C
- from_array uses Style B (Except that it skips the last division when the index is already the last record. Based on the logic, I'm not sure this can every happen, actually. See [here](https://github.com/dask/dask/blob/master/dask/dataframe/io.py#L314).
- from_pandas uses Style B
- from_bcolz uses Style B
- from_dask_array uses Style B
- from_castra uses Style A
- read_hdf uses Style C

So which one is the most "correct"? 

Essentially, I'm trying to use a `series.where` statement where if a condition is met, you use one column, otherwise, you use another (the others is actually a sequence in a `from_pandas` array.) I think the fact the divisions are slightly different is breaking this. 

Happy to program a fix. It looks straightforward enough for all but the from_castra and read_hdf. 

Thanks.

---

Update: I started watching march madness too early and messed up some of these


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask dataframe - divisions don't seem to be consistent #1057

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dask dataframe - divisions don't seem to be consistent #1057

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions