SLEP009: keyword only arguments#19
Conversation
|
Awesome, thanks for opening this! I think you should also address what which arguments we want to make keyword only, or at least raise the question. |
|
Also added a scope section. |
|
anything else left here before we can put it out for a vote? |
amueller
left a comment
There was a problem hiding this comment.
I made some suggestions on how to restructure this document. Right now it doesn't seem very clear to me.
Let me know what you think.
Happy to do another pass later.
|
@amueller I think this is in a much better state now. |
|
@adrinjalali thanks! You work to fast, I can't keep up ;) I'll try to read your feature names first... |
|
|
||
|
|
||
| Once the deprecation period is over, we'd remove the decorator and calling | ||
| the function/method with the positional arguments after `*` would fail. |
There was a problem hiding this comment.
I think the only thing missing from the discussion right now is what happens with IDEs/IPython when we do the decorator, and what happens with sphinx when we do the decorator.
There was a problem hiding this comment.
@thomasjpfan do you have the answer to these?
| The change can also be a gradual one in the span of two or three releases, | ||
| *i.e.* we can start by changing the ``__init__``s, and continue later with the | ||
| other ones. But that may cause more confusion, and changing all the above | ||
| categories together may be a better option. |
There was a problem hiding this comment.
I think the slep should argue for a particular way forward so we can vote on it, and give arguments for and against.
I think the definition of commonly used is still vague here.
Also, maybe discuss our current use of *args and **kwargs? That's only in make_column_transformer, make_pipeline and train_tests_split and these already require the other arguments to be passed by keyword, and they would remain unchanged, right?
What is the proposal for cross_validate and cross_val_score? How about GridSearchCV and accuracy_score, euclidean_distances?
I feel maybe estimator, X, y could be positional and the rest keyword only?
There was a problem hiding this comment.
I would appreciate an empirical analysis of usage of positional arguments in existing example code (and perhaps test code). Would you be able to get those stats from an abstract syntax tree, @adrinjalali?
| The change can also be a gradual one in the span of two or three releases, | ||
| *i.e.* we can start by changing the ``__init__``s, and continue later with the | ||
| other ones. But that may cause more confusion, and changing all the above | ||
| categories together may be a better option. |
There was a problem hiding this comment.
I would appreciate an empirical analysis of usage of positional arguments in existing example code (and perhaps test code). Would you be able to get those stats from an abstract syntax tree, @adrinjalali?
|
Thanks for putting this proposal together!
I think one major risk with this that needs to be evaluated is the effort required of user to fix the warnings as compared to the perceived benefit (which connects to the idea in #19 (comment) that we need better analysis of the current use of positional arguments). For passing numerous parameters the motivation is clear. When only the first or second parameter is passed prepositionally, it is less so. For instance, I can see myself using DetailsThere is indeed a risk that position of the Maybe we could adds some of these concerns under the "Consideration" section? |
|
@rth I wouldn't mind allowing that to be positional. The question is how to formulate a clear rule. There was the Odyssey project which might help with code analysis: https://odyssey.readthedocs.io/en/latest/tutorial.html Also interesting might be https://github.com/Quansight-Labs/python-api-inspect Someone that's better at SQL than me might be able to write the correct query example |
|
Lgtm.com should also be possible for querying use within this library
|
|
If we agree that the decision can be made case by case, then we don't have to make those decisions now. I tried to formulate that in a better and a bit more deterministic way, let me know how it looks now. |
jnothman
left a comment
There was a problem hiding this comment.
I agree that it can be decided ad-hoc.
|
So the next step is an email on the mailing list, and call for a vote? |
|
I'm not sure if doing a case-by-case is good. I feel like that would lead to a lot of bike-shedding, slow progress, and inconsistencies. |
|
I'm happy with separating the exact details of scoping it from the question of doing it, though. |
|
I'm just trying to find something that it has a chance of passing the vote. Otherwise I'd propose everything to be keyword only except |
|
Well I don't think that's good for |
Before the vote I think we might want to get more feedback from users and other core devs? How is it expected to work: we merge this and then discuss on the mailing list, or announce on the mailing list then wait for feedback here then vote on the mailing list? The https://scikit-learn.org/stable/governance.html#enhancement-proposals-sleps page is not very verbose about it, and the figure with all the possible CPython PEP statuses kind of lost me :) |
|
Well I think the default procedure is not a vote but consensus seeking, right? So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees. The vote was meant to put a deadline on discussions. If that's what you want to do then you could ask for a vote now. Having the SLEP be merged might be nice for visibility and readability, but I feel it's easier to comment on it in a PR. |
Sounds good! I agree it's easier to keep the PR open to comment on it. |
|
I agree that it's beneficial, but seems that this will need lots of work and I'm wondering whether it's worthwhile (Maybe we have many more important things to do). |
qinhanmin2014
left a comment
There was a problem hiding this comment.
I tend to regard this one as an issue for scikit-learn 1.0.
| For a smooth transition, we need an easy deprecation path. Similar to the | ||
| decorators developed in ``matplotlib``, a proposed solution is available at | ||
| [#13311](https://github.com/scikit-learn/scikit-learn/pull/13311), which | ||
| deprecates the usage of positional arguments on selected functions and methods. |
There was a problem hiding this comment.
first introduce it on selected functions and methods, will this make users surprised? I'm not sure.
qinhanmin2014
left a comment
There was a problem hiding this comment.
I see the approval from Joel and Andreas, but I'm still unable to understand the importance of keyword only args. I admit that it is less error-prone, but I tend to believe that such error rarely occurs.
| should be the same throughout the library in order to keep a consistent | ||
| interface to the user. | ||
|
|
||
| This proposal suggests making only *most commonly* used parameters positional. |
There was a problem hiding this comment.
Is is always the case? e.g., estimators is important for VotingRegressor, but I think it's acceptable to use VotingRegressor([('XXX', est1), ('XXX', est2)]) (actually this code snippet is extracted from our example).
There was a problem hiding this comment.
I don't understand your argument here @qinhanmin2014. estimators is VotingRegressor's most commonly used argument... So it would remain acceptable as a positional argument.
In general I'd say that all arguments that are essentially required will be accepted as positional. So in GridSearchCV for example: estimator and param_grid would be positional, cv maybe, scoring not.
There was a problem hiding this comment.
Sorry I misunderstood this part.
I think it's difficult to define most commonly used parameters here. Different people might have different opinions and we might make users surprised.
There was a problem hiding this comment.
In general I'd say that all arguments that are essentially required will be accepted as positional. So in GridSearchCV for example: estimator and param_grid would be positional, cv maybe, scoring not.
I think currently most people pass cv as keyword argument because there're so many other arguments before it?
There was a problem hiding this comment.
If people pass cv as keyword argument, then they won't have any issue after this change.
And for the other ones, they will see a DeprecationWarning for a while, and they know how to fix it. I don't understand the argument regarding the users being surprised. This change doesn't break any code immediately.
There was a problem hiding this comment.
I mean it's difficult to define most commonly used parameters here. Different people might have different opinions and we might make users surprised.
I'm not opposed to it, but this seems time-consuming and I can't see how users will benefit from it (I admit that it is less error-prone, but I tend to believe that such error rarely occurs.)
There was a problem hiding this comment.
I see the benefit here to be more about forcing improved readability than reducing the risk or consequence of user error. I fear that lots of uses not inculcated with writing readable code (including former physicists and statisticians, PhD students, etc) are prone to passing things positionally, along with C and Java coders, etc. Python forces other readable conventions on them, like whitespace. Let's improve scikit-learn code readability.
There was a problem hiding this comment.
I fear that lots of uses not inculcated with writing readable code (including former physicists and statisticians, PhD students, etc) are prone to passing things positionally, along with C and Java coders, etc.
I'm still a student so you and Andy definitely know more about how others write code :)
Actually we already force readability in many classes/functions (e.g., when an important parameter is not the first parameter.)
As I said above, I'm not opposed to it (though slightly worried about how do define the term "most commonly used parameters"), but I feel that it's time-consuming and trivial, so if someone has time, maybe he/she can focus on other things.
There was a problem hiding this comment.
You don't have to spend time on this @qinhanmin2014 if you don't want to. I'm happy to spend time on this as I see it as a very significant improvement on the API, and the solution is also not that time consuming as proposed in the PR.
|
@amueller how do we proceed from here? I think we're making this harder than it has to be. |
qinhanmin2014
left a comment
There was a problem hiding this comment.
Hmm, I can't merge PR here? Maybe someone can help me?
|
we were thinking of merging this PR once the conversation is completely over, that's why it hasn't been merged yet. I'll see if I can fix your permissions here (not sure why you don't have them) |
|
I think it's okay to merge with a couple of acceptances and then use
another pr to propose changing its status to accepted, and to admit further
tweaks.
|
|
Ok, then if there's not -1 votes, we can merge and announce on the mailing list? I think we only need to vote if there's at least one person objecting the change, otherwise we can go ahead with @thomasjpfan 's plan |
|
@adrinjalali I think there should be at least a sentence on how we will decide which parameters will be positional only. Will there be another slep? Will it be ad-hoc decisions (I don't like this). Will it be based on data? Will there be some discussion for a "rule" without writing another SLEP? I don't think we're making this harder as it has to be. I think a SLEP should provide a clear path forward. If this is accepted, I have no idea what would happen next. |
| The *most commonly* used parameters are defined per method or function, to be | ||
| defined as either of the following two ways: | ||
|
|
||
| - The set defined and agreed upon by the core developers, which should cover |
There was a problem hiding this comment.
ah ok didn't see that. Yeah that's fine.
|
ok happy to go ahead. Should we merge to facilitate viewing or not merge to facilitate commenting? ;) I think I'm +.5 for merge. |
|
Let's do it then :) |
This is to formalize our discussion around introducing keyword only arguments.
It follows scikit-learn/scikit-learn#12805 and scikit-learn/scikit-learn#13311
It should be an easy one, since the solution is already there, and it seems more like a matter of making a decision.
@scikit-learn/core-devs