Skip to content

Conversation

@SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Jul 12, 2024

Proposal as discussed on Slack. Please comment and/or approve. See also #3435.

@SimonHeybrock SimonHeybrock marked this pull request as ready for review July 12, 2024 11:10
Co-authored-by: Sunyoung Yoo <luysunyoung9@gmail.com>

### Proposed solution

1. Add a `dim` argument to all `bin` and `hist` functions to specify the dimension(s) to be replaced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the name dim is explicit enough in what it is doing.
I don't know if we can pick another name like remove or something better?

I can see that in our reduction operations, using the argument dim often means that this dimension will disappear in the output, but we do have some other operations like concat where the dim will appear in the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dim is best, since I feel it is consistent. concat is probably the only case where it is used for something else? But note that sc.concat and da.bins.concat both use dim for different things, for a function with the same name!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that da.hist(z=50, dim='y') is not so clear what it is doing.
In a reduction operation, I think it is clearer, as it is like the numpy axis argument: da.min(dim='y') is pretty clear that we apply the min along the y dim.

In make_binned we have erase as an argument. Would erase be clearer here?

That said, if we expect this to be used only by 'advanced' users, maybe the naming is not crucial?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at NumPy, or the Array API standard. Arguments like this are always named axis. In Scipp we are (hopefully?) consistently naming them dim. I have always imagined that this would be least confusing, since no matter what function you use it is always dim, with no need to read the documentation (provided that you know or guess the function has such an argument).

arg_dict: dict[str, int | Variable] | None = None,
/,
*,
dim: tuple[str, ...] | str | None = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with the name clash? If someone needs to have the dim dim in their output, they can use the dict syntax, but it is weird to have such a speacial case. Is it uncommon enough to allow this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I have never used a dimension named dim. I think the regular use where a keyword-arg is useful outnumber these hypothetical clashes (which can simply use a dict) 100:1. I believe we should make the 99% convenient and not worry about the 1% (given that there is a really simple alternative syntax).

@SimonHeybrock SimonHeybrock merged commit c169fcb into main Aug 2, 2024
@SimonHeybrock SimonHeybrock deleted the adr-hist-bin-api branch August 2, 2024 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants