-
Notifications
You must be signed in to change notification settings - Fork 21
Propose ADR-0018: Add dim argument to bin and hist
#3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Sunyoung Yoo <luysunyoung9@gmail.com>
|
|
||
| ### Proposed solution | ||
|
|
||
| 1. Add a `dim` argument to all `bin` and `hist` functions to specify the dimension(s) to be replaced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the name dim is explicit enough in what it is doing.
I don't know if we can pick another name like remove or something better?
I can see that in our reduction operations, using the argument dim often means that this dimension will disappear in the output, but we do have some other operations like concat where the dim will appear in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dim is best, since I feel it is consistent. concat is probably the only case where it is used for something else? But note that sc.concat and da.bins.concat both use dim for different things, for a function with the same name!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think that da.hist(z=50, dim='y') is not so clear what it is doing.
In a reduction operation, I think it is clearer, as it is like the numpy axis argument: da.min(dim='y') is pretty clear that we apply the min along the y dim.
In make_binned we have erase as an argument. Would erase be clearer here?
That said, if we expect this to be used only by 'advanced' users, maybe the naming is not crucial?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at NumPy, or the Array API standard. Arguments like this are always named axis. In Scipp we are (hopefully?) consistently naming them dim. I have always imagined that this would be least confusing, since no matter what function you use it is always dim, with no need to read the documentation (provided that you know or guess the function has such an argument).
| arg_dict: dict[str, int | Variable] | None = None, | ||
| /, | ||
| *, | ||
| dim: tuple[str, ...] | str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ok with the name clash? If someone needs to have the dim dim in their output, they can use the dict syntax, but it is weird to have such a speacial case. Is it uncommon enough to allow this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I have never used a dimension named dim. I think the regular use where a keyword-arg is useful outnumber these hypothetical clashes (which can simply use a dict) 100:1. I believe we should make the 99% convenient and not worry about the 1% (given that there is a really simple alternative syntax).
Proposal as discussed on Slack. Please comment and/or approve. See also #3435.