-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Hello,
I wanted to point out that the definition of a "division" is not consistent between the different input formats.
divisions appear to be a tuple which is either
- Style A
(first index in first division, last index is first division, last index in second division, ... , last index in n-1 division, last index in n division) - Style B
(first index in first division, first index in second division, ... , first index in n division, last index in n division) - Style C
All divisions are labeled as None.
I think this is how this is currently set-up...
- read_csv uses Style C
- from_array uses Style B (Except that it skips the last division when the index is already the last record. Based on the logic, I'm not sure this can every happen, actually. See here.
- from_pandas uses Style B
- from_bcolz uses Style B
- from_dask_array uses Style B
- from_castra uses Style A
- read_hdf uses Style C
So which one is the most "correct"?
Essentially, I'm trying to use a series.where statement where if a condition is met, you use one column, otherwise, you use another (the others is actually a sequence in a from_pandas array.) I think the fact the divisions are slightly different is breaking this.
Happy to program a fix. It looks straightforward enough for all but the from_castra and read_hdf.
Thanks.
Update: I started watching march madness too early and messed up some of these
Reactions are currently unavailable