[MRG+1] Improves class design for AgglomerativeClustering and FeatureAgglomeration#9875
Conversation
|
Need help in what needs to be done to pass the tests in |
|
"Improves the class design" is what sense? what is the design issue you identified? |
|
I am not sure if I am on right track but I tried using a _BaseClass, from which the two classes Agglomerative Clustering and Feature Extraction are derived and having pooling_func only in FeatureAgglomeration. I am not sure. |
|
@lesteve I guess this is what you were trying to convey. I have made a base class which is used by both AgglomerativeClustering and FeatureAgglomeration. The later one uses pooling_func, not the former. The above patch now raises an error: |
|
I think this is not quite right. You don't need a new base class. We want to remove I see the current PR's implementation, however, as more wrong than right. |
|
So, what I can understand from your concern is that, at present, I only need to add a deprecation warning which will raise a warning if a user tries to explicitly call |
|
Except that there cannot be a warning in FeatureAgglomeration.__init__, so
no, you also need to implement FeatureAgglomeration.__init__ as if
pooling_func in AgglomerativeClustering did not exist.
|
jnothman
left a comment
There was a problem hiding this comment.
Please mark the parameter as deprecated in the docstring.
sklearn/cluster/hierarchical.py
Outdated
| self.linkage = linkage | ||
| self.affinity = affinity | ||
| self.pooling_func = pooling_func | ||
| if self.pooling_func != np.mean: |
There was a problem hiding this comment.
This should really be in fit, not in __init__ according to our conventions. And ideally we'd change the default to something like 'deprecated' so that we can warn when the user explicitly passed pooling_func=np.mean
There was a problem hiding this comment.
Added the recommended changes, check once
sklearn/cluster/hierarchical.py
Outdated
| all observations of the two sets. | ||
|
|
||
| pooling_func : callable, default=np.mean | ||
| deprecated, since AgglomerativeClustering do not use pooling_func |
There was a problem hiding this comment.
I'd just say "ignored. Deprecate to be removed in ..." And remove the rest as irrelevant
|
@jnothman Review please. |
sklearn/cluster/hierarchical.py
Outdated
| This combines the values of agglomerated features into a single | ||
| value, and should accept an array of shape [M, N] and the keyword | ||
| argument ``axis=1``, and reduce it to an array of size [M]. | ||
| ignored, Deprecate to be removed in 0.22. |
There was a problem hiding this comment.
Look at other places we use the "deprecated" sphinx directive. git grep '\.\. deprecated' will show you plenty of examples.
sklearn/cluster/hierarchical.py
Outdated
| self | ||
| """ | ||
| if self.pooling_func != 'deprecated': | ||
| warnings.warn('"pooling_func" as a parameter argument is ' |
There was a problem hiding this comment.
Actually, it surprises me that this has test coverage. Please find out if there is a use of AgglomerativeClustering(pooling_func=...) in the codebase and remove it.
There was a problem hiding this comment.
I don't think there's a use of pooling_func. there's no test covering pooling_func.
sklearn/cluster/hierarchical.py
Outdated
| memory=None, | ||
| connectivity=None, compute_full_tree='auto', | ||
| linkage='ward', pooling_func=np.mean): | ||
| self.n_clusters = n_clusters |
There was a problem hiding this comment.
And could you please use super(FeatureAgglomeration, self).__init__(n_clusters=..., ...) instead of setting things individually here?
There was a problem hiding this comment.
Resolved.
…into aggcluster
|
somewhere in tests this warning is raised, says codecov
…On 19 Oct 2017 7:14 pm, "Kumar Ashutosh" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/cluster/hierarchical.py
<#9875 (comment)>
:
> @@ -694,6 +692,12 @@ def fit(self, X, y=None):
-------
self
"""
+ if self.pooling_func != 'deprecated':
+ warnings.warn('"pooling_func" as a parameter argument is '
I don't think there's a use of pooling_func. there's no test covering
pooling_func.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9875 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wYKs6l95l4sbAzbKMzV09mx31AAks5stwTsgaJpZM4PvpxO>
.
|
lesteve
left a comment
There was a problem hiding this comment.
The main problem is that with your changes as it is FeatureAgglomeration with default parameter has a warning:
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import AgglomerativeClustering, FeatureAgglomeration
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=300, centers=centers, cluster_std=0.5,
random_state=0)
model = FeatureAgglomeration()
model.fit(X)Some other comments too.
sklearn/cluster/hierarchical.py
Outdated
| This combines the values of agglomerated features into a single | ||
| value, and should accept an array of shape [M, N] and the keyword | ||
| argument ``axis=1``, and reduce it to an array of size [M]. | ||
| ignored, Deprecate to be removed in 0.22. |
There was a problem hiding this comment.
Look at other places we use the "deprecated" sphinx directive. git grep '\.\. deprecated' will show you plenty of examples.
sklearn/cluster/hierarchical.py
Outdated
| connectivity=connectivity, | ||
| compute_full_tree=compute_full_tree, | ||
| linkage=linkage, affinity=affinity, | ||
| pooling_func=pooling_func) |
There was a problem hiding this comment.
Do not pass pooling_function to the base class otherwise you'll get a warning in AgglomerativeClustering.__init__. This should be fine:
super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters,
memory=memory,
connectivity=connectivity,
compute_full_tree=compute_full_tree,
linkage=linkage, affinity=affinity)
self.pooling_func=pooling_func
sklearn/cluster/hierarchical.py
Outdated
| warnings.warn('"pooling_func" as a parameter argument is ' | ||
| 'deprecated in version 0.20 and will be removed ' | ||
| 'in version 0.22. It is being removed since ' | ||
| 'AgglomerativeClustering do not directly use ' |
There was a problem hiding this comment.
rewording:
Agglomerative 'pooling_func' parameter is not used. It has been deprecated in version 0.20
and will be removed in 0.22.
|
@lesteve Added the changes. I am still not able to figure out the warnings raised by codecov as pointed out by @jnothman . Also, there's a PEP8 failure "line too long". Here: Can I modify it in any way? |
sklearn/cluster/hierarchical.py
Outdated
| memory=None, | ||
| connectivity=None, compute_full_tree='auto', | ||
| linkage='ward', pooling_func=np.mean): | ||
| super(FeatureAgglomeration, self).__init__(n_clusters=n_clusters, |
There was a problem hiding this comment.
You can just put a newline after ( and indent 4 spaces
sklearn/cluster/hierarchical.py
Outdated
| @@ -644,7 +644,11 @@ class AgglomerativeClustering(BaseEstimator, ClusterMixin): | |||
| pooling_func : callable, default=np.mean | |||
| This combines the values of agglomerated features into a single | |||
There was a problem hiding this comment.
No, it does not. This should just say ignored. When @lesteve pointed to other uses of .. deprecated they may be cases where the param is deprecated, but still functional. This one is not.
sklearn/cluster/hierarchical.py
Outdated
| This combines the values of agglomerated features into a single | ||
| value, and should accept an array of shape [M, N] and the keyword | ||
| argument ``axis=1``, and reduce it to an array of size [M]. | ||
| ignored, Deprecated to be removed in 0.22. |
There was a problem hiding this comment.
I was not saying that you could or should not use .. deprecated but that leaving in an untrue description of behaviour helped no one.
|
Is this what you were suggesting? @jnothman |
|
I added a whats_new entry. I'll look at the generated doc and then merge if it looks fine. |
|
Merging, thanks a lot! |
…cs/donigian-update-contribution-guidelines * 'master' of github.com:scikit-learn/scikit-learn: (23 commits) fixes scikit-learn#10031: fix attribute name and shape in documentation (scikit-learn#10033) [MRG+1] add changelog entry for fixed and merged PR scikit-learn#10005 issue scikit-learn#9633 (scikit-learn#10025) [MRG] Fix LogisticRegression see also should include LogisticRegressionCV(scikit-learn#9995) (scikit-learn#10022) [MRG + 1] Labels of clustering should start at 0 or -1 if noise (scikit-learn#10015) MAINT Remove redundancy in scikit-learn#9552 (scikit-learn#9573) [MRG+1] correct comparison in GaussianNB for 'priors' (scikit-learn#10005) [MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#9399) [MRG] FIX bug in nested set_params usage (scikit-learn#9999) [MRG+1] Fix LOF and Isolation benchmarks (scikit-learn#9798) [MRG + 1] Fix negative inputs checking in mean_squared_log_error (scikit-learn#9968) DOC Fix typo (scikit-learn#9996) DOC Fix typo: x axis -> y axis (scikit-learn#9985) improve example plot_forest_iris.py (scikit-learn#9989) [MRG+1] Deprecate pooling_func unused parameter in AgglomerativeClustering (scikit-learn#9875) DOC update news DOC Fix three typos in manifold documentation (scikit-learn#9990) DOC add missing dot in docstring DOC Add what's new for 0.19.1 (scikit-learn#9983) Improve readability of outlier detection example. (scikit-learn#9973) DOC: Fixed typo (scikit-learn#9977) ...
Reference Issue
Fixes #9846
What does this implement/fix? Explain your changes.
Improves the class design of AgglomerativeClustering and FeatureAgglomeration
Any other comments?
Changes will be made if not green.