Skip to content

MNT make files private for cluster module#14948

Merged
NicolasHug merged 20 commits intoscikit-learn:masterfrom
NicolasHug:cluster_dep_underscore
Oct 14, 2019
Merged

MNT make files private for cluster module#14948
NicolasHug merged 20 commits intoscikit-learn:masterfrom
NicolasHug:cluster_dep_underscore

Conversation

@NicolasHug
Copy link
Copy Markdown
Member

@NicolasHug NicolasHug commented Sep 10, 2019

Addresses part of #12927 and part of #9250

This PR renames file.py into _file.py in the sklearn.cluster module.

In particular, please note that:

  • I rename dbscan_.py into _dbscan.py. Same for optics_ and mean_shift_

  • I renamed k_means_.py into _k_means.py. Since _k_means.pyx already existed, I had to rename it too. I chose _k_means_fast.pyx but no strong opinion. Same for hierarchical.

Ping @adrinjalali @thomasjpfan @rth @glemaitre.

Again, lint issue can be ignored and are triggered by new files being created.

@NicolasHug NicolasHug changed the title [NOMRG] underscore files for cluster folder [MRG] underscore files for cluster folder Sep 11, 2019
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a thorough review yet.

('sklearn.cluster.k_means_', 'KMeans'),
('sklearn.cluster.mean_shift_', 'MeanShift'),
('sklearn.cluster.optics_', 'OPTICS'),
('sklearn.cluster.spectral', 'SpectralClustering'),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked if this list is complete.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be leading to some confusion.

Can we simplify this check by only doing a module import:

import {deprecated_path}

So we do not need the importee?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would only check that

import sklearn.neural_network.rbm

raises a warning, but not that we can still import the stuff we used to from rbm. I think it's important to still ensure that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need a comment somewhere saying "We only need to import one thing from each file"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@adrinjalali
Copy link
Copy Markdown
Member

tests failing, and I just realized an issue with not raising a deprecation warning in pytest sessions, in contrast to ignoring those specific files: now we can't actually check if our examples and other places in the code are still using the deprecated path, we won't be issuing the warning during a pytest session.

@rth
Copy link
Copy Markdown
Member

rth commented Sep 12, 2019

and I just realized an issue with not raising a deprecation warning in pytest sessions, in contrast to ignoring those specific files: now we can't actually check if our examples and other places in the code are still using the deprecated path, we won't be issuing the warning during a pytest session.

Right but examples don't use pytest so you should see them? For everything else, if it's imported at top level you would have seen it at collection time only once, anyway? Feel free to use the other approach is you think it's really preferable.

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests seem to be certificate issue. LGTM :)

@NicolasHug
Copy link
Copy Markdown
Member Author

(I still need to check the examples as you suggested, I'll do that soon)

@glemaitre
Copy link
Copy Markdown
Member

If you could also check if there is any base classes

@NicolasHug
Copy link
Copy Markdown
Member Author

I've removed __all__ in the privte _file.py now


from .spectral import spectral_clustering, SpectralClustering
from .mean_shift_ import (mean_shift, MeanShift,
from ._spectral import spectral_clustering, SpectralClustering
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is, do we want to include BaseSpectral here.

Or generally, any base class. I think we haven't decided on that, have we?

@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Sep 15, 2019 via email

@NicolasHug
Copy link
Copy Markdown
Member Author

I am strongly against making base classes public. It's pretty much equivalent to making all the helpers public.

They are often very tightly coupled to the child classes, and having them public would make any refactoring or bug fix near impossible.

I'm pretty sure we've been consistently breaking backward compatibility of base classes.


Sort of related: that's another level of API contract, but any method that is not fit, transform, or something explicitly documented should be implicitly private too.

@NicolasHug
Copy link
Copy Markdown
Member Author

They are part of the public API, otherwise, it
should be named _BaseSpectral (as _BaseComposition)

We have tons of things that are public but should have been private in the first place, this is precisely the point of these PRs: clean the public API space.

@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Sep 16, 2019 via email

@NicolasHug
Copy link
Copy Markdown
Member Author

we should follow the deprecation path as well, is that correct?

We are properly deprecating all the base classes. All the tools that aren't in cluster.__init__ are now private (with deprecation), and base classes aren't in init

@adrinjalali
Copy link
Copy Markdown
Member

merge conflict and tests are failing.

@NicolasHug
Copy link
Copy Markdown
Member Author

NicolasHug commented Sep 18, 2019

I'm leaving as-is for now, I think we need to discuss the conflicts issues #12927 (comment) on Monday during the meeting

@glemaitre glemaitre self-requested a review September 23, 2019 14:24
@adrinjalali
Copy link
Copy Markdown
Member

I kinda didn't understand what the conclusion on this topic was during the meeting.

Do we continue and comment on PRs telling people how to handle merge conflicts?

@NicolasHug
Copy link
Copy Markdown
Member Author

@thomasjpfan is thinking of some solution that could avoid the conflicts... waiting for him to submit the idea

@NicolasHug
Copy link
Copy Markdown
Member Author

Just updated the PR.

@thomasjpfan wanna give it a look?

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Oct 8, 2019

Where do you do the generation here?

@thomasjpfan
Copy link
Copy Markdown
Member

Where do you do the generation here?

@jnothman It is done in sklearn/_build_utils/deprecated_modules.py by adding to the _DEPRECATED_MODULES set.

@NicolasHug
Copy link
Copy Markdown
Member Author

NicolasHug commented Oct 9, 2019

Are you OK with this @thomasjpfan ?

@adrinjalali wanna give it a second round since method has changed?

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I think this lgtm but I've not been rigorous

@@ -20,10 +20,6 @@
from ..utils.validation import assert_all_finite, check_array


__all__ = ['SpectralCoclustering',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will technically mean that people who previously imported * from this module will now get check_array in their locals... Not sure that matters

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Can't we just keep this? What's the point of removing it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of removing it?

It suggests that users might be allowed to import from this. And there's no reason to protect users that want to import * IMO.

But OK I'll put it back.

@thomasjpfan
Copy link
Copy Markdown
Member

There is a linting error.

@NicolasHug
Copy link
Copy Markdown
Member Author

There is a linting error.

It's going to happen in every deprecation PR. It's because new files are being created. I don't think we should fix them here (increases the possibilities of a painful merge)

…-learn; branch 'master' of github.com:scikit-learn/scikit-learn into cluster_dep_underscore
@NicolasHug
Copy link
Copy Markdown
Member Author

Since @jnothman seems to be +1 as well I'll merge tomorrow unless we have additional comments

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't checked very thoroughly but LGTM, aside from one minor comment

@@ -20,10 +20,6 @@
from ..utils.validation import assert_all_finite, check_array


__all__ = ['SpectralCoclustering',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Can't we just keep this? What's the point of removing it?

from ..utils.validation import FLOAT_DTYPES
from ..exceptions import ConvergenceWarning
from . import _k_means
from . import _k_means_fast as _k_means
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find the naming *_fast.pyx files more meaningful than with underscores as it was done before.

@NicolasHug
Copy link
Copy Markdown
Member Author

Thanks for the review @rth!

Comments are addressed so I'll merge

@NicolasHug NicolasHug changed the title [MRG] underscore files for cluster folder MNT make files private for cluster module Oct 14, 2019
@NicolasHug NicolasHug merged commit 14295f9 into scikit-learn:master Oct 14, 2019
@NicolasHug NicolasHug deleted the cluster_dep_underscore branch October 14, 2019 16:29
@jnothman
Copy link
Copy Markdown
Member

My colleague, @namoshizun, has had trouble upgrading his nightly build installation due to this PR.

import sklearn.cluster when when trying to perform from ._k_means import k_means.

He had an installation of the scikit-learn nightly build prior to this PR, then upgraded that environment with pip install -U --pre -f https://sklearn-nightly.scdn8.secure.raxcdn.com scikit-learn. For unknown reasons, the existing _k_means module (_k_means.cpython-36m-darwin.so) was not removed when the prev version uninstalled. The import of _k_means then referenced this .so rather than the new _k_means.py.

I'm not sure why the old .so was not removed by pip, and I have not been able to replicate this. But I thought others should be aware.

@thomasjpfan
Copy link
Copy Markdown
Member

I tried install the stable release and then pip install -U ... to install the dev build and was not able to produce this either.

Does uninstalling and then reinstall the dev version resolve @namoshizun's issue?

@adrinjalali
Copy link
Copy Markdown
Member

I don't know what pip install -U exactly does, but I have had the exact same issue with make in, since those files are not removed. But it seems very odd that pip wouldn't remove the old ones.

@stsievert stsievert mentioned this pull request Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants