-
Notifications
You must be signed in to change notification settings - Fork 21
ADR: Restrict Dataset to items with matching dimensionality
#3185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst
Outdated
Show resolved
Hide resolved
docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst
Outdated
Show resolved
Hide resolved
docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst
Outdated
Show resolved
Hide resolved
docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst
Outdated
Show resolved
Hide resolved
docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst
Show resolved
Hide resolved
| - ``Dataset`` will no longer be able to represent certain types of data. | ||
| Users will need to resort to ``DataGroup`` instead, which has other limitations, such as requiring to duplicate coordinates. | ||
| Another option would be to replicate data values of the lower-dimensional items to match the dimensionality of the higher-dimensional items. | ||
| This would reuqire more memory, but would force the users to be explicit about the meaning of data they want to represent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done with a broadcast which would only increase memory use when you do an operation with it.
Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>
| Recently we have introduced ``DataGroup``, which drops the restriction of compatible dimension extents but, unlike ``Dataset``, does in turn not provide support for joint coordinates. | ||
| The addition of ``DataGroup`` was triggered by a long series of cases where we realized that ``Dataset`` is not useful and flexbile enough in real applications. | ||
| This is not to say that ``Dataset`` is entirely useless, but it is not as useful as we had hoped. | ||
| One key area that is not covered by ``DataGroup`` is the representation of table-like data (or multi-dimensional generalizations thereof), in a manner similar to ``pandas.DataFrame``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the name Dataset was inspired by the Xarray Dataset which can have members with different dimensionality, should we consider renaming our Dataset to DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had a paragraph with that consideration in an earlier draft, and then removed it :D
I now think that DataFrame is too loaded by Pandas knowledge:
- Always 1-D
- Indices, no coordinates
- No dimension labels
- Big well known API that we do not plan to adopt
| ~~~~~~~~ | ||
|
|
||
| There are two possible ways of reasoning about ``Dataset``. | ||
| Firstly, we may argue that while technically complex, the work has already been done, and the problems detailed below are encountered only in edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we finally going to refactor the C++ implementation of DataArray to not use Dataset but its own dedicated implementation? (at the moment, it almost feels like we already have legacy code in the C++ that no one wants to touch because it 'works')
(and maybe the same with the 'buckets' which should really be called 'bins'?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already did that several years ago.
No description provided.