SLEP009: keyword only arguments by adrinjalali · Pull Request #19 · scikit-learn/enhancement_proposals

adrinjalali · 2019-07-13T12:10:05Z

This is to formalize our discussion around introducing keyword only arguments.

It follows scikit-learn/scikit-learn#12805 and scikit-learn/scikit-learn#13311

It should be an easy one, since the solution is already there, and it seems more like a matter of making a decision.

@scikit-learn/core-devs

amueller · 2019-07-13T14:29:33Z

Awesome, thanks for opening this! I think you should also address what which arguments we want to make keyword only, or at least raise the question.
Is this going to be for all parameters in estimators, or some? For only estimators, or also for functions? Which functions? For methods? (fit params?)

adrinjalali · 2019-07-15T11:56:40Z

Also added a scope section.

adrinjalali · 2019-07-18T12:02:17Z

anything else left here before we can put it out for a vote?

amueller

I made some suggestions on how to restructure this document. Right now it doesn't seem very clear to me.
Let me know what you think.
Happy to do another pass later.

adrinjalali · 2019-07-23T14:38:40Z

@amueller I think this is in a much better state now.

amueller · 2019-07-23T15:40:24Z

@adrinjalali thanks! You work to fast, I can't keep up ;) I'll try to read your feature names first...

amueller · 2019-07-23T22:28:47Z

+
+
+Once the deprecation period is over, we'd remove the decorator and calling
+the function/method with the positional arguments after `*` would fail.


I think the only thing missing from the discussion right now is what happens with IDEs/IPython when we do the decorator, and what happens with sphinx when we do the decorator.

@thomasjpfan do you have the answer to these?

amueller · 2019-07-23T22:37:11Z

+The change can also be a gradual one in the span of two or three releases,
+*i.e.* we can start by changing the ``__init__``s, and continue later with the
+other ones. But that may cause more confusion, and changing all the above
+categories together may be a better option.


I think the slep should argue for a particular way forward so we can vote on it, and give arguments for and against.

I think the definition of commonly used is still vague here.
Also, maybe discuss our current use of *args and **kwargs? That's only in make_column_transformer, make_pipeline and train_tests_split and these already require the other arguments to be passed by keyword, and they would remain unchanged, right?

What is the proposal for cross_validate and cross_val_score? How about GridSearchCV and accuracy_score, euclidean_distances?

I feel maybe estimator, X, y could be positional and the rest keyword only?

I would appreciate an empirical analysis of usage of positional arguments in existing example code (and perhaps test code). Would you be able to get those stats from an abstract syntax tree, @adrinjalali?

jnothman · 2019-07-23T23:29:14Z

+The change can also be a gradual one in the span of two or three releases,
+*i.e.* we can start by changing the ``__init__``s, and continue later with the
+other ones. But that may cause more confusion, and changing all the above
+categories together may be a better option.


I would appreciate an empirical analysis of usage of positional arguments in existing example code (and perhaps test code). Would you be able to get those stats from an abstract syntax tree, @adrinjalali?

rth · 2019-07-24T14:59:22Z

Thanks for putting this proposal together!

I feel maybe estimator, X, y could be positional and the rest keyword only?

I think one major risk with this that needs to be evaluated is the effort required of user to fix the warnings as compared to the perceived benefit (which connects to the idea in #19 (comment) that we need better analysis of the current use of positional arguments). For passing numerous parameters the motivation is clear. When only the first or second parameter is passed prepositionally, it is less so.

For instance, I can see myself using NearestNeighbours().radius_neighbors(X, radius) (to take an arbitrary example) by passing radius positionally . For that matter it's often used that way in scikit-learn code base,

Details

$ rg -A1 "radius_neighbors\(" sklearn --iglob '!test_*'
sklearn/cluster/mean_shift_.py
94:        i_nbrs = nbrs.radius_neighbors([my_mean], bandwidth,
95-                                       return_distance=False)[0]
--
230:            neighbor_idxs = nbrs.radius_neighbors([center],
231-                                                  return_distance=False)[0]

sklearn/cluster/optics_.py
502:    indices = nbrs.radius_neighbors(P, radius=max_eps,
503-                                    return_distance=False)[0]

sklearn/feature_selection/mutual_info_.py
65:    ind = nn.radius_neighbors(radius=radius, return_distance=False)
66-    nx = np.array([i.size for i in ind])
--
69:    ind = nn.radius_neighbors(radius=radius, return_distance=False)
70-    ny = np.array([i.size for i in ind])
--
139:    ind = nn.radius_neighbors(radius=radius, return_distance=False)
140-    m_all = np.array([i.size for i in ind])

sklearn/neighbors/base.py
622:    def radius_neighbors(self, X=None, radius=None, return_distance=True):
623-        """Finds the neighbors within a given radius of a point or points.
--
670:        >>> rng = neigh.radius_neighbors([[1., 1., 1.]])
671-        >>> print(np.asarray(rng[0][0]))
--
835:            A_ind = self.radius_neighbors(X, radius,
836-                                          return_distance=False)
--
839:            dist, A_ind = self.radius_neighbors(X, radius,
840-                                                return_distance=True)

sklearn/neighbors/classification.py
356:        neigh_dist, neigh_ind = self.radius_neighbors(X)
357-        inliers = [i for i, nind in enumerate(neigh_ind) if len(nind) != 0]

sklearn/neighbors/unsupervised.py
96:      >>> nbrs = neigh.radius_neighbors([[0, 0, 1.3]], 0.4, return_distance=False)
97-      >>> np.asarray(nbrs[0][0])

sklearn/neighbors/regression.py
303:        neigh_dist, neigh_ind = self.radius_neighbors(X)
304-

sklearn/cluster/dbscan_.py
174:        neighborhoods = neighbors_model.radius_neighbors(X, eps,
175-                                                         return_distance=False)

There is indeed a risk that position of the radius argument changes, but in practice it's quite unlikely.

Maybe we could adds some of these concerns under the "Consideration" section?

amueller · 2019-07-24T15:06:50Z

@rth I wouldn't mind allowing that to be positional. The question is how to formulate a clear rule.

There was the Odyssey project which might help with code analysis:

https://odyssey.readthedocs.io/en/latest/tutorial.html

Also interesting might be

https://github.com/Quansight-Labs/python-api-inspect

Someone that's better at SQL than me might be able to write the correct query example

jnothman · 2019-07-24T22:04:36Z

Lgtm.com should also be possible for querying use within this library

adrinjalali · 2019-07-25T11:50:37Z

If we agree that the decision can be made case by case, then we don't have to make those decisions now.

I tried to formulate that in a better and a bit more deterministic way, let me know how it looks now.

jnothman

I agree that it can be decided ad-hoc.

adrinjalali · 2019-07-25T13:20:43Z

So the next step is an email on the mailing list, and call for a vote?

amueller · 2019-07-25T14:47:28Z

I'm not sure if doing a case-by-case is good. I feel like that would lead to a lot of bike-shedding, slow progress, and inconsistencies.

amueller · 2019-07-25T14:48:07Z

I'm happy with separating the exact details of scoping it from the question of doing it, though.

adrinjalali · 2019-07-25T14:53:00Z

I'm just trying to find something that it has a chance of passing the vote. Otherwise I'd propose everything to be keyword only except X and y!

amueller · 2019-07-25T14:55:11Z

Well I don't think that's good for GridSearchCV or cross_validate. I think estimator should be possible positionally.

rth · 2019-07-25T15:08:17Z

So the next step is an email on the mailing list, and call for a vote?

Before the vote I think we might want to get more feedback from users and other core devs? How is it expected to work: we merge this and then discuss on the mailing list, or announce on the mailing list then wait for feedback here then vote on the mailing list?

The https://scikit-learn.org/stable/governance.html#enhancement-proposals-sleps page is not very verbose about it, and the figure with all the possible CPython PEP statuses kind of lost me :)

amueller · 2019-07-25T15:20:28Z

Well I think the default procedure is not a vote but consensus seeking, right? So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees. The vote was meant to put a deadline on discussions. If that's what you want to do then you could ask for a vote now.

Having the SLEP be merged might be nice for visibility and readability, but I feel it's easier to comment on it in a PR.
I don't think we formalized the process of this yet and I think we can see what works best. Right now I feel keeping it a PR might be more conducive for discussion.
How would someone comment on a SLEP after it's merged? We could have an issue for discussion but that makes it harder to comment on specific sections or to suggest changes.

rth · 2019-07-25T15:24:17Z

So I would say going to the mailing list asking for feedback would be a good idea. We don't need a vote unless anyone disagrees.

Sounds good! I agree it's easier to keep the PR open to comment on it.

qinhanmin2014 · 2019-08-04T14:21:20Z

I agree that it's beneficial, but seems that this will need lots of work and I'm wondering whether it's worthwhile (Maybe we have many more important things to do).

qinhanmin2014

I tend to regard this one as an issue for scikit-learn 1.0.

qinhanmin2014 · 2019-08-04T14:23:39Z

+For a smooth transition, we need an easy deprecation path. Similar to the
+decorators developed in ``matplotlib``, a proposed solution is available at
+[#13311](https://github.com/scikit-learn/scikit-learn/pull/13311), which
+deprecates the usage of positional arguments on selected functions and methods.


first introduce it on selected functions and methods, will this make users surprised? I'm not sure.

qinhanmin2014

I see the approval from Joel and Andreas, but I'm still unable to understand the importance of keyword only args. I admit that it is less error-prone, but I tend to believe that such error rarely occurs.

qinhanmin2014 · 2019-08-05T03:45:38Z

+should be the same throughout the library in order to keep a consistent
+interface to the user.
+
+This proposal suggests making only *most commonly* used parameters positional.


Is is always the case? e.g., estimators is important for VotingRegressor, but I think it's acceptable to use VotingRegressor([('XXX', est1), ('XXX', est2)]) (actually this code snippet is extracted from our example).

I don't understand your argument here @qinhanmin2014. estimators is VotingRegressor's most commonly used argument... So it would remain acceptable as a positional argument.

In general I'd say that all arguments that are essentially required will be accepted as positional. So in GridSearchCV for example: estimator and param_grid would be positional, cv maybe, scoring not.

Sorry I misunderstood this part.
I think it's difficult to define most commonly used parameters here. Different people might have different opinions and we might make users surprised.

In general I'd say that all arguments that are essentially required will be accepted as positional. So in GridSearchCV for example: estimator and param_grid would be positional, cv maybe, scoring not.

I think currently most people pass cv as keyword argument because there're so many other arguments before it?

If people pass cv as keyword argument, then they won't have any issue after this change.

And for the other ones, they will see a DeprecationWarning for a while, and they know how to fix it. I don't understand the argument regarding the users being surprised. This change doesn't break any code immediately.

I mean it's difficult to define most commonly used parameters here. Different people might have different opinions and we might make users surprised.
I'm not opposed to it, but this seems time-consuming and I can't see how users will benefit from it (I admit that it is less error-prone, but I tend to believe that such error rarely occurs.)

I see the benefit here to be more about forcing improved readability than reducing the risk or consequence of user error. I fear that lots of uses not inculcated with writing readable code (including former physicists and statisticians, PhD students, etc) are prone to passing things positionally, along with C and Java coders, etc. Python forces other readable conventions on them, like whitespace. Let's improve scikit-learn code readability.

I fear that lots of uses not inculcated with writing readable code (including former physicists and statisticians, PhD students, etc) are prone to passing things positionally, along with C and Java coders, etc.

I'm still a student so you and Andy definitely know more about how others write code :)

Actually we already force readability in many classes/functions (e.g., when an important parameter is not the first parameter.)

As I said above, I'm not opposed to it (though slightly worried about how do define the term "most commonly used parameters"), but I feel that it's time-consuming and trivial, so if someone has time, maybe he/she can focus on other things.

You don't have to spend time on this @qinhanmin2014 if you don't want to. I'm happy to spend time on this as I see it as a very significant improvement on the API, and the solution is also not that time consuming as proposed in the PR.

adrinjalali · 2019-08-06T14:41:43Z

@amueller how do we proceed from here? I think we're making this harder than it has to be.

qinhanmin2014

Hmm, I can't merge PR here? Maybe someone can help me?

adrinjalali · 2019-08-06T14:49:04Z

we were thinking of merging this PR once the conversation is completely over, that's why it hasn't been merged yet. I'll see if I can fix your permissions here (not sure why you don't have them)

jnothman · 2019-08-07T02:09:14Z

I think it's okay to merge with a couple of acceptances and then use another pr to propose changing its status to accepted, and to admit further tweaks.

adrinjalali · 2019-08-07T07:37:46Z

Ok, then if there's not -1 votes, we can merge and announce on the mailing list? I think we only need to vote if there's at least one person objecting the change, otherwise we can go ahead with @thomasjpfan 's plan

amueller · 2019-08-07T15:19:40Z

@adrinjalali I think there should be at least a sentence on how we will decide which parameters will be positional only. Will there be another slep? Will it be ad-hoc decisions (I don't like this). Will it be based on data? Will there be some discussion for a "rule" without writing another SLEP?

I don't think we're making this harder as it has to be. I think a SLEP should provide a clear path forward. If this is accepted, I have no idea what would happen next.

adrinjalali · 2019-08-07T15:30:02Z

+The *most commonly* used parameters are defined per method or function, to be
+defined as either of the following two ways:
+
+- The set defined and agreed upon by the core developers, which should cover


@amueller are these two not concrete enough?

ah ok didn't see that. Yeah that's fine.

amueller · 2019-08-07T15:33:44Z

ok happy to go ahead. Should we merge to facilitate viewing or not merge to facilitate commenting? ;) I think I'm +.5 for merge.

adrinjalali · 2019-08-07T15:35:45Z

Let's do it then :)

adrinjalali added 2 commits July 13, 2019 14:04

slep009: keyword only arguments

3fc9c84

add a note

a1a7238

adrinjalali mentioned this pull request Jul 13, 2019

ENH Add Deprecating Position Arguments Helper scikit-learn/scikit-learn#13311

Merged

adrinjalali added 3 commits July 13, 2019 14:14

minor fix

067c6cd

fixing code blocks

f5dea36

fixing code blocks

1e4bbac

amueller reviewed Jul 13, 2019

View reviewed changes

Comment thread slep009/proposal.rst Outdated

amueller reviewed Jul 13, 2019

View reviewed changes

Comment thread slep009/proposal.rst Outdated

clarifications, matplotlib, scope

13152e8

formatting

664bc34

amueller requested changes Jul 18, 2019

View reviewed changes

adrinjalali added 2 commits July 19, 2019 14:04

restructuring and applying comments

c482320

address more comments

78a70ed

amueller approved these changes Jul 23, 2019

View reviewed changes

amueller reviewed Jul 23, 2019

View reviewed changes

jnothman reviewed Jul 23, 2019

View reviewed changes

resolve most of Joel's comments

1c5efb3

reformulating options

02ea764

jnothman approved these changes Jul 25, 2019

View reviewed changes

qinhanmin2014 reviewed Aug 4, 2019

View reviewed changes

qinhanmin2014 reviewed Aug 5, 2019

View reviewed changes

qinhanmin2014 approved these changes Aug 6, 2019

View reviewed changes

adrinjalali commented Aug 7, 2019

View reviewed changes

jnothman merged commit b5b4a68 into scikit-learn:master Sep 10, 2019

jnothman mentioned this pull request Sep 17, 2019

Implement SLEP009: keyword-only arguments scikit-learn/scikit-learn#15005

Closed

40 tasks

adrinjalali deleted the slep009/kwargs branch December 26, 2019 02:19



		Once the deprecation period is over, we'd remove the decorator and calling
		the function/method with the positional arguments after `*` would fail.

Uh oh!

Conversation

adrinjalali commented Jul 13, 2019

Uh oh!

Uh oh!

amueller commented Jul 13, 2019

Uh oh!

Uh oh!

adrinjalali commented Jul 15, 2019

Uh oh!

adrinjalali commented Jul 18, 2019

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali commented Jul 23, 2019

Uh oh!

amueller commented Jul 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 24, 2019 via email

Uh oh!

adrinjalali commented Jul 25, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Jul 25, 2019

Uh oh!

amueller commented Jul 25, 2019

Uh oh!

amueller commented Jul 25, 2019

Uh oh!

adrinjalali commented Jul 25, 2019

Uh oh!

amueller commented Jul 25, 2019

Uh oh!

rth commented Jul 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Jul 25, 2019

Uh oh!

rth commented Jul 25, 2019

Uh oh!

qinhanmin2014 commented Aug 4, 2019

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rth commented Jul 24, 2019 •

edited

Loading

amueller commented Jul 24, 2019 •

edited

Loading

rth commented Jul 25, 2019 •

edited

Loading