Skip to content

ENH: make apply_function aware of channel index#12206

Merged
larsoner merged 19 commits intomne-tools:mainfrom
dominikwelke:issue-10034
Feb 6, 2024
Merged

ENH: make apply_function aware of channel index#12206
larsoner merged 19 commits intomne-tools:mainfrom
dominikwelke:issue-10034

Conversation

@dominikwelke
Copy link
Copy Markdown
Contributor

Fixes #10034.

this PR makes the index of currently processed channel available within custom functions applied to data.
this is needed for processing channels differently during parallel processing, as requested in #10034 .

@dominikwelke
Copy link
Copy Markdown
Contributor Author

dominikwelke commented Nov 14, 2023

uses inspect.getfullargspec to check the custom function for input arguments, as suggested by @larsoner .
currently I only allow a specific variable name (ch_idx) to be detected - please suggest an alternative naming, if appropriate.
alternatively, one could just check for number of requested variables, but i think this would make things messier.

initial commit only for Base.raw. if you are happy with the solution i will copy to epochs and evoked

@dominikwelke
Copy link
Copy Markdown
Contributor Author

I can also extend the code to support ch_name.

Co-authored-by: Mathieu Scheltienne <mathieu.scheltienne@gmail.com>
@dominikwelke
Copy link
Copy Markdown
Contributor Author

quick question for you @drammock :
when porting the functionality to evoked arrays I noticed that Evoked.apply_function did not currently have the option to process the whole array in place (-> channel_wise=False).

discussing with @wmvanvliet and @britta-wstnr we couldnt think of a good reason why this should be intentional, so I implemented this functionality analogous to Rawand Epochs.
I did so in a separate commit (f88ccd0), so no problem to revert if there was a good reason..

@hoechenberger
Copy link
Copy Markdown
Member

I like the functionality you're adding here!

What I found a little surprising, though, is that we allow functions that by themselves accept parameters passed by position, but then we insist on specific parameter names. I have never seen this anywhere in any other Python project I've worked on. If we want to enforce a parameter name, we should probably insist on a keyword-only argument

@hoechenberger
Copy link
Copy Markdown
Member

hoechenberger commented Nov 17, 2023

I mean seriously, if we have to use inspection just to find out how to call a function in an API that we define, doesn't it sound like a hack to you immediately? To me it does...

@mscheltienne
Copy link
Copy Markdown
Member

you allow for a function signature that has positional parameters, but you force users to adopt a specific name for their positional parameters. This is highly unusual

Would you be OK if we force the signature to be one of:

f(x)
f(x, *, ch_name)
f(x, *, ch_id)
f(x, *, ch_name, ch_id)
f(x, *, ch_id, ch_name)

using the same function signature but with a different name of the parameter changes function behavior. This, again, is something I have never seen done anywhere and it would cause a great deal of confusion to me if I couldn't name my positional parameters the way I want to

I'm not sure I follow this one, anyway the function signature can not be different from the one mentionned above. f(x, foo) would raise, I think?

@cbrnr
Copy link
Copy Markdown
Contributor

cbrnr commented Nov 17, 2023

Because I find clearer from a user perspective to provide a function with a signature f(x, ch_name) rather than f(x) which will have its signature silently modified to f(x, ch_name) because of this extra argument + now f(x) will have some undefined ch_name variable in its scope.

I'm not sure we're talking about the same thing. I'm not proposing to silently modify the function signature.

@hoechenberger
Copy link
Copy Markdown
Member

f(x)
f(x, *, ch_name)
f(x, *, ch_id)
f(x, *, ch_name, ch_id)
f(x, *, ch_id, ch_name)

Yes, this would work if we want to go with kw-args

Personally I'd prefer to pass by position but the above would be acceptable for me too

@hoechenberger
Copy link
Copy Markdown
Member

hoechenberger commented Nov 17, 2023

... would still require inspection though ... but I think it keeps it to minimum at least

I'd prefer to have a solution that can do without any introspection ideally

I can see why you don't like @cbrnr's proposal so much though

OTOH this is how we've been doing things in MNE-Python in many places for many years

So I don't understand why this is immediately off the table now?

I'd love to hear @agramfort weigh in if he has time

@hoechenberger
Copy link
Copy Markdown
Member

hoechenberger commented Nov 17, 2023

Also I'd like to say: thanks for everyone contributing here, be it code or by sharing their opinion in this discussion! 🤗

@drammock
Copy link
Copy Markdown
Member

drammock commented Nov 17, 2023

After a discussion with @britta-wstnr, @larsoner and @dominikwelke we are all in favor of the current implementation (as is I think @wmvanvliet and @mscheltienne also are?). Here is the rationale as I see it:

  • currently, we impose quite minimal constraints on the user-passed function: it must take 1 argument (can be called anything: x, foo, whatever) that is a NumPy array, and return a NumPy array of the same dimensions.
  • if we were only adding 1 more supported parameter, we could say "interpreted positionally, call it whatever you want". But because we're adding 2 more supported parameters, we must either:
    • force an order on the new params which means that if ch_idx comes before ch_name, then ch_idx must always be in the signature even if the user only wants/needs to use ch_name. That is slightly burdensome on the user.
    • allow free order on the new params which means that we must either:
      • force the user to make them kw-only, which means explaining what the * in the function signature means, and forcing them to use it. This is also slightly burdensome on the user.
      • allow the user to make the params technically positional and in either order, and we inspect the signature to see what params are there, and pass in (by name) one or both of the two extra params that we're offering to supply. This boils down to (as @hoechenberger said):

        allowing for positional parameters but determine their functionality based on their name.

I agree that this is unusual. But to me the question from the UX perspective is: do we compel the user to write their one-off channel-modifying function to follow best-practices around APIs? Or do we say "its first arg must be the data, and it can have 1 or 2 others (ch_name and/or ch_idx) if you want/need them." To me this is a perfect use case for inspect --- it allows us to make the user's job slightly easier, reduces the amount of explanation we have to give around how to craft these functions, and as long as they follow the instructions about param naming they don't need to think/worry about param order.

@cbrnr
Copy link
Copy Markdown
Contributor

cbrnr commented Nov 17, 2023

I am still not in favor, but whatever.

@hoechenberger
Copy link
Copy Markdown
Member

hoechenberger commented Nov 17, 2023

I see your arguments but I disagree that this makes it easier for users. Like you said, it's unusual, hence likely to cause confusion or at least additional cognitive load. It sets MNE's behavior apart from the rest of the Python ecosystem by introducing some kind of meta language feature, and thereby adds inconsistencies without any need to do so.

My 2 ct

@hoechenberger
Copy link
Copy Markdown
Member

This is not a veto. But I want it to be noted that I think this implementation is a bad idea.

@drammock
Copy link
Copy Markdown
Member

Like you said, it's unusual, hence likely to cause confusion or at least additional cognitive load.

For transparency, my perspective rests on the assumptions that

  1. it seems unusual only because we have intermediate/advanced knowledge of how Python works
  2. many/most of our users are not Python experts, and will not even notice that there is anything unusual about this at all, because they are not calling their own function (we are calling it behind the scenes).

Personally I had been coding in Python a few years before even seeing an * in a function signature, and then I had to look up what it meant. Should we require users have that bit of background knowledge, if we can safely do what needs doing without requiring it?

@hoechenberger
Copy link
Copy Markdown
Member

hoechenberger commented Nov 17, 2023

Like you said, it's unusual, hence likely to cause confusion or at least additional cognitive load.

For transparency, my perspective rests on the assumptions that

  1. it seems unusual only because we have intermediate/advanced knowledge of how Python works

Exactly. That's why it should be avoided, because as soon as users start gaining experience and using other libraries like scikit-learn, they'll experience confusion due to this inconsistency.

Also, we're being inconsistent within MNE itself. We don't do this stuff anywhere else even within MNE AFAIK, which IMHO is reason enough to seriously reconsider this decision.

  1. many/most of our users are not Python experts, and will not even notice that there is anything unusual about this at all, because they are not calling their own function (we are calling it behind the scenes).

We're asking users to define their own function that does NumPy data manipulation but don't trust they'll be able to write a function signature or know any Python basics? This doesn't make sense, in my opinion – it's a contradiction in itself.

Personally I had been coding in Python a few years before even seeing an * in a function signature, and then I had to look up what it meant.

kw-only arguments are relatively new and advanced stuff. Hence I'm not a big fan of having them here – I only brought them up because they'd be the clean solution to the idea of passing parameters by name. If I could choose, I'd define a fixed signature for the callback, and pass things by position. This is what happens with Matplotlib callbacks, with scikit-learn scorers etc.

Should we require users have that bit of background knowledge, if we can safely do what needs doing without requiring it?

I'm just saying by trying to make things easier for absolute beginners, you're adding confusion to anyone wiling to advance or already having an advanced level of Python proficiency. And adding inconsistent behavior inside MNE-Python itself.

We're adopting SPECs and Black coding styles and whatnot, but when it comes to functions, we start our very own special thing because MNE-Python is so special. Sorry, err… no. Just no. I mean do it if you want to, but I don't like it. At all. I do agree that Python is too difficult for beginners, though.

@hoechenberger
Copy link
Copy Markdown
Member

Thanks for the meeting summaries and explanations, @drammock

I know it's difficult to keep everyone involved & informed here while we're working and discussing so asynchronously. I appreciate your effort.

@larsoner
Copy link
Copy Markdown
Member

@dominikwelke I think you should be okay to proceed with the inspect-based implementation

@drammock
Copy link
Copy Markdown
Member

as soon as users start gaining experience and using other libraries like scikit-learn, they'll experience confusion due to this inconsistency.

I think this is an assumption without evidence. They might not notice or remember the difference, they might be confused, or they might think (with curiosity or even admiration) "huh, I wonder how MNE-Python pulled that off?"

We don't do this stuff anywhere else even within MNE

Depends on what you mean by "this stuff". We do use inspect elsewhere. I don't think we take this specific approach to callables elsewhere, but we've also AFAIK never needed to (i.e., never had a motivation to be flexible about the call signature of a passed-in callable). Now we have that motivation. FWIW I don't think we should change our whole codebase to do this for all callables, it wouldn't make sense.

We're asking users to define their own function that does NumPy data manipulation but don't trust they'll be able to write a function signature or know any Python basics?

to me this isn't about trusting the user, it's about making life easier for them. We do this all over the place by allowing list/tuple/array, for example, or by allowing them to pass "mean" instead of passing the actual function np.mean. Here, we're saving them the trouble of writing the extra *, in their callable, or even knowing about how to specify a keyword-only param.

kw-only arguments are relatively new and advanced stuff.

Wait, are they a "Python basic" or are they "advanced stuff"?

but when it comes to functions, we start our very own special thing because MNE-Python is so special.

Let me give an analogy. I want to use a function that someone else wrote. I see it has a signature def divide(foo, bar):. Of course I read the docstring, because I don't know if I'm supposed to pass dividend, divisor or the other way around. If instead the signature was def divide(*, foo, bar): that doesn't help me at all because I still need to know what to pass in as foo and what to pass in as bar. Now instead imagine the signature was def divide(dividend, divisor):. Great! Even if there's no docstring, I feel fairly safe writing assert divide(6, 2) == 3 (positional) or assert divide(divisor=2, dividend=6) == 3 (keyword).

Now turn the perspective around. The "user" in this case is not me, it is MNE-Python's code. It needs to call a function that someone else wrote. It cannot read that function's docstring, and it cannot (easily) read that function's source code to figure out how it works. But what can it do? It can look at the param names, and say "aha! I have a dividend and a divisor here, and I see param names dividend and divisor in the function signature. I feel confident that I can pass dividend=dividend, divisor=divisor in either order and expect to get the right result regardless of the order they appeared in the function signature and regardless of whether the function would have accepted them as positional!"

To me that is a case of our code handling realistic, real-world user input (callables) flexibly, smartly and gracefully. It is not us "starting our very own special thing".

With that, I'm going to sign off of this discussion. My goal here is not necessarily to change your vote @hoechenberger, but hopefully lead you to feel less vehemently opposed to the will of the majority. You're welcome to have the last word for posterity, if you want it.

@hoechenberger
Copy link
Copy Markdown
Member

You're entirely ignoring the possibility of passing parameters by position, or @cbrnr's proposal of adding a switch to control the behavior, like we do elsewhere. I don't know why you're providing an example of good vs bad parameter naming, it just seems out of place.

@wmvanvliet
Copy link
Copy Markdown
Contributor

Given that this is a pretty niche bit of functionality, could we perhaps YAGNI a bit and only give the channel index, not its name? So either f(x) or f(x, ch_index). Given that the purpose of the function is to modify the data (as opposed to say plotting it) I think ch_index will be used a lot more than ch_name. Perhaps it's not too much to ask for the user to do a inst.ch_names[ch_index] to look up the channel name if they need to. Or is there some scenario in which it would be harder to look up the channel name? (dropped bad channels come to mind).

@dominikwelke
Copy link
Copy Markdown
Contributor Author

dominikwelke commented Nov 21, 2023

just applied @mscheltienne 's suggestions. let me know what to do next

could we perhaps YAGNI a bit and only give the channel index, not its name? (...) Or is there some scenario in which it would be harder to look up the channel name? (dropped bad channels come to mind).

this would work, but yes - for the user it would be more dificult if they wanted to use ch_names. they would have to pull/create a list or dict with the channel names before or while setting up their def that must correspond to insts ch_names when eventually processed. this seems doable but quite error prone and i think is incompatible with rejecting channels between def declaration and calling apply_function.
in the end we would require users to keep track of something that is readily available within apply_function's scope (->inst.info)

if this was to be the desired route, i would rather consider only offering ch_name - this way ppl could directly access relevant information (e.g. scalp location or hemispere in case of eeg - if 'O' in ch_name: ...) and define more transparent/human readable dictionaries if they needed an int.

@wmvanvliet
Copy link
Copy Markdown
Contributor

I understand the decision has already been made, but for the record, I would also be fine with just having either 1 or 3 positional parameters. So either f(x) or f(x, ch_idx, ch_name). The inconvenience to the user to have a possibly useless parameter passed to their function is so small it doesn't warrant forcing through an apparently controversial design decision.

Copy link
Copy Markdown
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks pretty good! a couple of suggested simplifications/cleanups below. Reviewing this makes me realize how much code duplication there is in these three methods; maybe after this one is merged you could try (if you want to) to refactor some of that duplication into a helper function that all three methods could use?

mne/epochs.py Outdated
if dtype is not None and dtype != self._data.dtype:
self._data = self._data.astype(dtype)

args = getfullargspec(fun)[0] + getfullargspec(fun)[4]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take advantage of the fact that getfullargspec returns a named tuple, to make this line a bit more transparent about what we're doing:

Suggested change
args = getfullargspec(fun)[0] + getfullargspec(fun)[4]
args = getfullargspec(fun).args + getfullargspec(fun).kwonlyargs

Comment on lines +265 to +267
channel_wise=True,
verbose=None,
**kwargs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominikwelke do this too please

@dominikwelke
Copy link
Copy Markdown
Contributor Author

hi @drammock and all
apologies it took me forever to look into this again..
i applied the requested changes

Copy link
Copy Markdown
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with it @dominikwelke, looks good and implements what we (eventually!) converged on, so in it goes!

@larsoner larsoner merged commit 87df00d into mne-tools:main Feb 6, 2024
snwnde pushed a commit to snwnde/mne-python that referenced this pull request Mar 20, 2024
Co-authored-by: Mathieu Scheltienne <mathieu.scheltienne@gmail.com>
@dominikwelke dominikwelke deleted the issue-10034 branch April 14, 2025 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Access the current channel name within apply_function()

7 participants