Skip to content

MNT Completes position arg deprecation#17272

Merged
adrinjalali merged 11 commits intoscikit-learn:masterfrom
thomasjpfan:complete_position_arg_deprecation
May 19, 2020
Merged

MNT Completes position arg deprecation#17272
adrinjalali merged 11 commits intoscikit-learn:masterfrom
thomasjpfan:complete_position_arg_deprecation

Conversation

@thomasjpfan
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan commented May 18, 2020

After this PR here are the public functions that have 2 or more arguments that do not have a * for keyword only arguments. In this case, "public function" means it is listed in classes.rst.

These functions are not updated with this PR:

[<function sklearn._config.set_config(assume_finite=None, working_memory=None, print_changed_only=None, display=None)>,
 <function sklearn.compose._column_transformer.make_column_transformer(*transformers, **kwargs)>,
 <function sklearn.covariance._shrunk_covariance.shrunk_covariance(emp_cov, shrinkage=0.1)>,
 <function sklearn.feature_extraction.image.reconstruct_from_patches_2d(patches, image_size)>,
 <function sklearn.feature_extraction.image.extract_patches(arr, patch_shape=8, extraction_step=1)>,
 <function sklearn.feature_selection._univariate_selection.f_classif(X, y)>,
 <function sklearn.feature_selection._univariate_selection.chi2(X, y)>,
 <function sklearn.isotonic.check_increasing(x, y)>,
 <function sklearn.metrics._ranking.auc(x, y)>,
 <function sklearn.metrics._regression.max_error(y_true, y_pred)>,
 <function sklearn.metrics.cluster._supervised.adjusted_rand_score(labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.homogeneity_score(labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.completeness_score(labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._unsupervised.calinski_harabasz_score(X, labels)>,
 <function sklearn.metrics.cluster._unsupervised.davies_bouldin_score(X, labels)>,
 <function sklearn.metrics.pairwise.laplacian_kernel(X, Y=None, gamma=None)>,
 <function sklearn.metrics.pairwise.chi2_kernel(X, Y=None, gamma=1.0)>,
 <function sklearn.metrics.pairwise.linear_kernel(X, Y=None, dense_output=True)>,
 <function sklearn.metrics.pairwise.polynomial_kernel(X, Y=None, degree=3, gamma=None, coef0=1)>,
 <function sklearn.metrics.pairwise.haversine_distances(X, Y=None)>,
 <function sklearn.metrics.pairwise.additive_chi2_kernel(X, Y=None)>,
 <function sklearn.metrics.pairwise.paired_cosine_distances(X, Y)>,
 <function sklearn.metrics.pairwise.cosine_similarity(X, Y=None, dense_output=True)>,
 <function sklearn.metrics.pairwise.rbf_kernel(X, Y=None, gamma=None)>,
 <function sklearn.metrics.pairwise.paired_euclidean_distances(X, Y)>,
 <function sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, gamma=None, coef0=1)>,
 <function sklearn.metrics.pairwise.paired_manhattan_distances(X, Y)>,
 <function sklearn.metrics.pairwise.cosine_distances(X, Y=None)>,
 <function sklearn.model_selection._search.fit_grid_point(X, y, estimator, parameters, train, test, scorer, verbose, error_score=nan, **fit_params)>,
 <function sklearn.neural_network._base.log_loss(y_true, y_prob)>,
 <function sklearn.pipeline.make_pipeline(*steps, **kwargs)>,
 <function sklearn.pipeline.make_union(*transformers, **kwargs)>,
 <function sklearn.preprocessing._data.add_dummy_feature(X, value=1.0)>,
 <function sklearn.utils.safe_mask(X, mask)>,
 <function sklearn.utils.resample(*arrays, **options)>,
 <function sklearn.utils.shuffle(*arrays, **options)>,
 <function sklearn.utils.safe_indexing(X, indices, axis=0)>,
 <function sklearn.utils.class_weight.compute_class_weight(class_weight, classes, y)>,
 <function sklearn.utils.extmath.safe_sparse_dot(a, b, dense_output=False)>,
 <function sklearn.utils.extmath.density(w, **kwargs)>,
 <function sklearn.utils.extmath.randomized_svd(M, n_components, n_oversamples=10, n_iter='auto', power_iteration_normalizer='auto', transpose='auto', flip_sign=True, random_state=0)>,
 <function sklearn.utils.extmath.randomized_range_finder(A, size, n_iter, power_iteration_normalizer='auto', random_state=None)>,
 <function sklearn.utils.extmath.weighted_mode(a, w, axis=0)>,
 <function sklearn.utils.graph.single_source_shortest_path_length(graph, source, cutoff=None)>,
 <function sklearn.utils.sparsefuncs.inplace_csr_column_scale(X, scale)>,
 <function sklearn.utils.sparsefuncs.inplace_swap_column(X, m, n)>,
 <function sklearn.utils.sparsefuncs.inplace_swap_row(X, m, n)>,
 <function sklearn.utils.sparsefuncs.incr_mean_variance_axis(X, axis, last_mean, last_var, last_n)>,
 <function sklearn.utils.sparsefuncs.mean_variance_axis(X, axis)>,
 <function sklearn.utils.sparsefuncs.inplace_row_scale(X, scale)>,
 <function sklearn.utils.sparsefuncs.inplace_column_scale(X, scale)>,
 <function sklearn.utils.validation.has_fit_parameter(estimator, parameter)>]

The utils and pairwise ones are the biggest bunch that do not have *.

Are there any other functions we want to add * to?

@thomasjpfan
Copy link
Copy Markdown
Member Author

CC @adrinjalali @NicolasHug

@thomasjpfan thomasjpfan added this to the 0.23.1 milestone May 18, 2020
Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if green

@adrinjalali
Copy link
Copy Markdown
Member

Thanks Thomas. I'd make args positional in these functions as well:

[<function sklearn._config.set_config(*, assume_finite=None, working_memory=None, print_changed_only=None, display=None)>,
 <function sklearn.feature_extraction.image.extract_patches(arr, *, patch_shape=8, extraction_step=1)>,
 <function sklearn.metrics._regression.max_error(*, y_true, y_pred)>,
 <function sklearn.metrics.cluster._supervised.adjusted_rand_score(*, labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.homogeneity_score(*, labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.completeness_score(*, labels_true, labels_pred)>,
 <function sklearn.metrics.pairwise.laplacian_kernel(X, Y=None, *, gamma=None)>,
 <function sklearn.metrics.pairwise.chi2_kernel(X, Y=None, *, gamma=1.0)>,
 <function sklearn.metrics.pairwise.linear_kernel(X, Y=None, *, dense_output=True)>,
 <function sklearn.metrics.pairwise.polynomial_kernel(X, Y=None, *, degree=3, gamma=None, coef0=1)>,
 <function sklearn.metrics.pairwise.cosine_similarity(X, Y=None, *, dense_output=True)>,
 <function sklearn.metrics.pairwise.rbf_kernel(X, Y=None, *, gamma=None)>,
 <function sklearn.metrics.pairwise.sigmoid_kernel(X, Y=None, *, gamma=None, coef0=1)>,
 <function sklearn.model_selection._search.fit_grid_point(X, y, estimator, parameters, *, train, test, scorer, verbose, error_score=nan, **fit_params)>,
 <function sklearn.neural_network._base.log_loss(*, y_true, y_prob)>,
 <function sklearn.utils.safe_indexing(X, indices, *, axis=0)>,
 <function sklearn.utils.class_weight.compute_class_weight(*, class_weight, classes, y)>,
 <function sklearn.utils.extmath.safe_sparse_dot(a, b, *, dense_output=False)>,
 <function sklearn.utils.extmath.randomized_svd(M, n_components, *, n_oversamples=10, n_iter='auto', power_iteration_normalizer='auto', transpose='auto', flip_sign=True, random_state=0)>,
 <function sklearn.utils.extmath.randomized_range_finder(A, *, size, n_iter, power_iteration_normalizer='auto', random_state=None)>,
 <function sklearn.utils.extmath.weighted_mode(a, w, *, axis=0)>,
 <function sklearn.utils.graph.single_source_shortest_path_length(graph, source, *, cutoff=None)>,
 <function sklearn.utils.sparsefuncs.incr_mean_variance_axis(X, *, axis, last_mean, last_var, last_n)>,
 <function sklearn.utils.sparsefuncs.mean_variance_axis(X, *, axis)>,
 <function sklearn.utils.sparsefuncs.inplace_row_scale(X, *, scale)>,
 <function sklearn.utils.sparsefuncs.inplace_column_scale(X, *, scale)>]

WDYT? I do find *_true, *_pred always very confusing, and never remember which one's first. So I can imaging with positional calls people would make mistakes there.

@adrinjalali
Copy link
Copy Markdown
Member

tests are also failing.

@thomasjpfan
Copy link
Copy Markdown
Member Author

WDYT? I do find *_true, *_pred always very confusing, and never remember which one's first. So I can imaging with positional calls people would make mistakes there.

I am +0.25 on this, because I think there would be too many warnings with this change. We already released 0.23 version where we left y_true and y_pred positional.

@adrinjalali
Copy link
Copy Markdown
Member

WDYT of adding a "version" parameter to _parameterize_positional_args and deprecate those in 0.24?

@thomasjpfan
Copy link
Copy Markdown
Member Author

WDYT of adding a "version" parameter to _parameterize_positional_args and deprecate those in 0.24?

What do you mean by "those"?

I think the deprecations in this PR should still have a 2 version deprecation cycle.

@adrinjalali
Copy link
Copy Markdown
Member

Sorry, by "those" I meant *_true, *_pred ones, and I meant to add a parameter to the decorator function so that they're added as deprecated in 0.24, and fixed in 0.26. right now versions are hard coded in the decorator. And I meant to do the deprecation in a separate PR which won't be released in 0.23.

@thomasjpfan
Copy link
Copy Markdown
Member Author

Sorry, by "those" I meant *_true, *_pred ones, and I meant to add a parameter to the decorator function so that they're added as deprecated in 0.24, and fixed in 0.26.

Ah I see. I can add version to the decorator. We can open an issue for making *_true and *_pred keyword only and discuss it in our next call.

@NicolasHug
Copy link
Copy Markdown
Member

 <function sklearn.metrics.cluster._supervised.adjusted_rand_score(*, labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.homogeneity_score(*, labels_true, labels_pred)>,
 <function sklearn.metrics.cluster._supervised.completeness_score(*, labels_true, labels_pred)>,

IMO we shouldn't make these kwonly considering we kept y_true and y_pred positional for all the other metrics.

I do find *_true, *_pred always very confusing

I sort of agree but the good news is we're always consistent on this. In the end that's just a convention users need to get used to, just like they get used to passing X before y.

@adrinjalali
Copy link
Copy Markdown
Member

I thought it'd be an easy sell, but apparently @NicolasHug doesn't think so :P So I'm +1 on the issue and talking about it in the call.

@adrinjalali
Copy link
Copy Markdown
Member

WDYT of the other suggestions I have in #17272 (comment)? (kinda have a feeling we may have discussed some of them in other PRs)

@thomasjpfan
Copy link
Copy Markdown
Member Author

Here are the ones I left out from #17272 (comment)

  • set_config: circular imports
  • extract_patches: will be removed in 0.24
  • fit_grid_point: will be removed in 0.24
  • compute_class_weight: Internally, we use this with no keyword arguments.
  • neural_network._base.log_loss: False positive - not public
  • mean_variance_axis: args implied by the name of the function
  • inplace_row_scale: args implied by the name of the function
  • inplace_column_scale: args implied by the name of the function

@thomasjpfan
Copy link
Copy Markdown
Member Author

I guess compute_class_weight is the one I am unsure about. I would leave it as is, or do compute_class_weight(class_weight, *, classes, y).

@adrinjalali
Copy link
Copy Markdown
Member

I agree with you. But it seems there are too many places internally where we call it positionally :/

algorithm='auto', leaf_size=30, p=2, sample_weight=None,
n_jobs=None):
@_deprecate_positional_args
def dbscan(X, *, eps=0.5, min_samples=5, metric='minkowski',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBSCAN accepts positional eps


def graphical_lasso(emp_cov, alpha, cov_init=None, mode='cd', tol=1e-4,
@_deprecate_positional_args
def graphical_lasso(emp_cov, *, alpha, cov_init=None, mode='cd', tol=1e-4,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GraphicalLasso accepts positional alpha

# Kernels
def linear_kernel(X, Y=None, dense_output=True):
@_deprecate_positional_args
def linear_kernel(X, Y=None, *, dense_output=True):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the PR I touched this file, we decided not to touch the kernels, but I'm happy with this.

Copy link
Copy Markdown
Member Author

@thomasjpfan thomasjpfan May 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16982 (comment)

I'll leave it alone.

"0.22 and will be removed in version 0.24.")
def safe_indexing(X, indices, axis=0):
@_deprecate_positional_args
def safe_indexing(X, indices, *, axis=0):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're deprecating this func anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I already have a PR that removes it

@thomasjpfan
Copy link
Copy Markdown
Member Author

I went with compute_class_weight(class_weight, *, classes, y) and made all the needed changes.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM when green. Are you fine with the changes @NicolasHug ?

X = sparse.lil_matrix((1000, 1000))
msg = "A sparse matrix was passed, but dense data is required."
assert_raise_message(TypeError, msg, estimate_bandwidth, X, 200)
assert_raise_message(TypeError, msg, estimate_bandwidth, X)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is setting quantile to 200 which does not make sense for this check.

@adrinjalali adrinjalali mentioned this pull request May 18, 2020
@NicolasHug
Copy link
Copy Markdown
Member

yes

@adrinjalali adrinjalali merged commit e770715 into scikit-learn:master May 19, 2020
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request May 19, 2020
* ENH Adds public functions with position args warning

* BUG Circlar import

* STY

* BUG Add keywords

* BUG Fixes warnings

* ENH Adds other public functions

* CLN Suggestion

* CLN Suggestion

* REV Revert pairwise

* ENH Positoinal args for compute_class_weight
adrinjalali pushed a commit that referenced this pull request May 19, 2020
* ENH Adds public functions with position args warning

* BUG Circlar import

* STY

* BUG Add keywords

* BUG Fixes warnings

* ENH Adds other public functions

* CLN Suggestion

* CLN Suggestion

* REV Revert pairwise

* ENH Positoinal args for compute_class_weight
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
* ENH Adds public functions with position args warning

* BUG Circlar import

* STY

* BUG Add keywords

* BUG Fixes warnings

* ENH Adds other public functions

* CLN Suggestion

* CLN Suggestion

* REV Revert pairwise

* ENH Positoinal args for compute_class_weight
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants