ADR: Restrict `Dataset` to items with matching dimensionality #3185

SimonHeybrock · 2023-07-04T05:13:08Z

No description provided.

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst

jl-wynen · 2023-07-04T06:01:03Z

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst

+- ``Dataset`` will no longer be able to represent certain types of data.
+  Users will need to resort to ``DataGroup`` instead, which has other limitations, such as requiring to duplicate coordinates.
+  Another option would be to replicate data values of the lower-dimensional items to match the dimensionality of the higher-dimensional items.
+  This would reuqire more memory, but would force the users to be explicit about the meaning of data they want to represent.


This could be done with a broadcast which would only increase memory use when you do an operation with it.

Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>

nvaytet · 2023-07-04T20:34:38Z

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst

+Recently we have introduced ``DataGroup``, which drops the restriction of compatible dimension extents but, unlike ``Dataset``, does in turn not provide support for joint coordinates.
+The addition of ``DataGroup`` was triggered by a long series of cases where we realized that ``Dataset`` is not useful and flexbile enough in real applications.
+This is not to say that ``Dataset`` is entirely useless, but it is not as useful as we had hoped.
+One key area that is not covered by ``DataGroup`` is the representation of table-like data (or multi-dimensional generalizations thereof), in a manner similar to ``pandas.DataFrame``.


Since the name Dataset was inspired by the Xarray Dataset which can have members with different dimensionality, should we consider renaming our Dataset to DataFrame?

I actually had a paragraph with that consideration in an earlier draft, and then removed it :D

I now think that DataFrame is too loaded by Pandas knowledge:

Always 1-D

Indices, no coordinates

No dimension labels

Big well known API that we do not plan to adopt

nvaytet · 2023-07-04T20:37:14Z

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst

+~~~~~~~~
+
+There are two possible ways of reasoning about ``Dataset``.
+Firstly, we may argue that while technically complex, the work has already been done, and the problems detailed below are encountered only in edge cases.


Are we finally going to refactor the C++ implementation of DataArray to not use Dataset but its own dedicated implementation? (at the moment, it almost feels like we already have legacy code in the C++ that no one wants to touch because it 'works')
(and maybe the same with the 'buckets' which should really be called 'bins'?)

We already did that several years ago.

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst

SimonHeybrock added 3 commits June 30, 2023 12:11

Begin drafting ADR on removing Dataset support for mixed dimensionality

b8d8168

Add a lot of info and brainstorming

d38b3e6

Cleanup

0833b49

SimonHeybrock requested review from YooSunYoung, jl-wynen and nvaytet July 4, 2023 05:13

SimonHeybrock mentioned this pull request Jul 4, 2023

A possible roadmap #3114

Closed

jl-wynen reviewed Jul 4, 2023

View reviewed changes

SimonHeybrock and others added 2 commits July 4, 2023 08:07

Apply suggestions from code review

543532f

Co-authored-by: Jan-Lukas Wynen <jan-lukas.wynen@ess.eu>

Comment on other reasons for Dataset

13511ea

jl-wynen approved these changes Jul 4, 2023

View reviewed changes

nvaytet reviewed Jul 4, 2023

View reviewed changes

SimonHeybrock commented Jul 6, 2023

View reviewed changes

docs/reference/developer/adr/0017-restrict-dataset-to-items-with-matching-dimensionality.rst Outdated Show resolved Hide resolved

YooSunYoung approved these changes Jul 6, 2023

View reviewed changes

Change status to "accepted"

8eab201

SimonHeybrock merged commit adec01d into main Jul 6, 2023

SimonHeybrock deleted the adr-dataset-item-dimensionality branch July 6, 2023 07:52

SimonHeybrock mentioned this pull request Jul 6, 2023

Restrict Dataset to items with matching dimensionality #3189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADR: Restrict `Dataset` to items with matching dimensionality #3185

ADR: Restrict `Dataset` to items with matching dimensionality #3185

Uh oh!

SimonHeybrock commented Jul 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jl-wynen Jul 4, 2023

Uh oh!

nvaytet Jul 4, 2023

Uh oh!

SimonHeybrock Jul 5, 2023

Uh oh!

nvaytet Jul 4, 2023

Uh oh!

SimonHeybrock Jul 5, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ADR: Restrict Dataset to items with matching dimensionality #3185

ADR: Restrict Dataset to items with matching dimensionality #3185

Uh oh!

Conversation

SimonHeybrock commented Jul 4, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jl-wynen Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

nvaytet Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

nvaytet Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ADR: Restrict `Dataset` to items with matching dimensionality #3185

ADR: Restrict `Dataset` to items with matching dimensionality #3185