DOC Add links to KMeans examples in docstrings and the user guide#27799
DOC Add links to KMeans examples in docstrings and the user guide#27799ArturoAmorQ merged 14 commits intoscikit-learn:mainfrom
Conversation
ArturoAmorQ
left a comment
There was a problem hiding this comment.
Thanks for the PR @marenwestermann! Here is a batch of comments :)
doc/modules/clustering.rst
Outdated
| (generally) distant from each other, leading to probably better results than | ||
| random initialization, as shown in the reference. | ||
| random initialization, as shown in the reference. For a detailed example of | ||
| comaparing different initialization schemes refer to |
There was a problem hiding this comment.
| comaparing different initialization schemes refer to | |
| comparing different initialization schemes, refer to |
doc/modules/clustering.rst
Outdated
|
|
||
| K-means can be used for vector quantization. This is achieved using the | ||
| transform method of a trained model of :class:`KMeans`. | ||
| transform method of a trained model of :class:`KMeans`. For an example of |
There was a problem hiding this comment.
| transform method of a trained model of :class:`KMeans`. For an example of | |
| `transform` method of a trained model of :class:`KMeans`. For an example of |
doc/modules/clustering.rst
Outdated
| using the iris dataset | ||
|
|
||
| * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering | ||
| using KMeans and MiniBatchKMeans based on sparse data |
There was a problem hiding this comment.
| using KMeans and MiniBatchKMeans based on sparse data | |
| using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data |
doc/modules/clustering.rst
Outdated
|
|
||
| .. topic:: Examples: | ||
|
|
||
| * :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means |
There was a problem hiding this comment.
| * :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of K-Means | |
| * :ref:`sphx_glr_auto_examples_cluster_plot_cluster_iris.py`: Example usage of :class:`KMeans` |
doc/modules/clustering.rst
Outdated
| * :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and | ||
| MiniBatchKMeans |
There was a problem hiding this comment.
| * :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of KMeans and | |
| MiniBatchKMeans | |
| * :ref:`sphx_glr_auto_examples_cluster_plot_mini_batch_kmeans.py`: Comparison of | |
| :class:`KMeans` and :class:`MiniBatchKMeans` |
doc/modules/clustering.rst
Outdated
| * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering using sparse | ||
| MiniBatchKMeans | ||
| * :ref:`sphx_glr_auto_examples_text_plot_document_clustering.py`: Document clustering | ||
| using KMeans and MiniBatchKMeans based on sparse data |
There was a problem hiding this comment.
| using KMeans and MiniBatchKMeans based on sparse data | |
| using :class:`KMeans` and :class:`MiniBatchKMeans` based on sparse data |
| - top right: What the effect of a bad initialization is | ||
| - top right: What using three clusters would deliver. | ||
|
|
||
| - bottom left: What the effect of a bad initialization is |
There was a problem hiding this comment.
Maybe this can be done in another PR, but currently it seems that the initialization is good. I would rather pass a fixed random_state to KMeans instead of setting a global np.random.seed
| # using the model results itself. In that case, the :ref:`Silhouette Coefficient | ||
| # <sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py>` comes in handy. |
There was a problem hiding this comment.
I would rather say something similar to
"In that case the Silhouette analysis comes in handy. See sphx_glr_auto_examples_cluster_plot_kmeans_silhouette_analysis.py for an example on how to do it."
|
|
||
| # Convert to floats instead of the default 8 bits integer coding. Dividing by | ||
| # 255 is important so that plt.imshow behaves works well on float data (need to | ||
| # 255 is important so that plt.imshow works well on float data (need to |
ArturoAmorQ
left a comment
There was a problem hiding this comment.
Now it does LGTM, thanks @marenwestermann and sorry for taking so long to answer! (I was/still am off on holidays)
Reference Issues/PRs
towards #26927
What does this implement/fix? Explain your changes.
Adds links to examples in the docstrings and the user guide which demonstrate how to use K-Means.
Any other comments?
I started with the example
plot_cluster_iris.pyand then realised that it probably makes sense to group all the links related to K-Means examples in one PR. So I will keep working on adding links to examples which show how to use K-Means.Edit: the examples are
Note: there can be more than one PR per example script because they might be referenced in different locations. For example there is an existing open PR for plot_document_clustering.py which links this example in the docs of a other estimator.