DOC Update plot_mahalanobis_distances to notebook style by lucyleeow · Pull Request #17089 · scikit-learn/scikit-learn

lucyleeow · 2020-04-30T09:48:52Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

Update plot_mahalanobis_distances.py to notebook style with alternating code and text
set random seed for consistent results
expand explainations

Any other comments?

thomasjpfan

Since we broke up the example images into two images, there are two places in the user guide that will now have a different image: doc/modules/outlier_detection.rst (Fitting an elliptic envelope) and doc/modules/covariance.rst (Minimum Covariance Determinant). The images of the contour plot still seems to work in the context of the user guide. Do you think so as well?

thomasjpfan · 2020-05-08T20:53:42Z

examples/covariance/plot_mahalanobis_distances.py

 plt.show()
+
+# %%
+# [1] P. J. Rousseeuw. Least median of squares regression. J. Am


Could we make these references link correctly? I feed they are better at the top of the page under the initial description. Specifically under the paragraph:

Associated applications include outlier detection, observation ranking and clustering.

thomasjpfan · 2020-05-08T20:57:23Z

examples/covariance/plot_mahalanobis_distances.py

+
+.. math::
+
+    d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)


Is using T for transpose clearer for you?

Suggested change

d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)

d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)^T\Sigma^{-1}(x_i - \mu)

thomasjpfan · 2020-05-08T21:06:15Z

examples/covariance/plot_mahalanobis_distances.py

+# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
+# are replaced with Gaussian outlier samples where feature 1 has standard
+# devation = 1 and feature 2 has standard deviation = 7.
+


Move the import numpy as np here?

thomasjpfan · 2020-05-08T21:06:24Z

examples/covariance/plot_mahalanobis_distances.py

+# that of the MCD robust estimator (1.2). This shows that the MCD based
+# robust estimator is much more resistant to the outlier samples, which were
+# designed to have a much larger variance in feature 2.
+


Move the from sklearn.covariance import EmpiricalCovariance, MinCovDet and matplotlib import here?

lucyleeow · 2020-05-09T14:34:47Z

Thanks for the review @thomasjpfan

glemaitre

a couple of nitpicks

glemaitre · 2020-05-18T16:10:17Z

examples/covariance/plot_mahalanobis_distances.py

+# Generate data
+# --------------
+#
+# First we generate a dataset of 125 samples and 2 features. Both features


Suggested change

# First we generate a dataset of 125 samples and 2 features. Both features

# First, we generate a dataset of 125 samples and 2 features. Both features

glemaitre · 2020-05-18T16:10:36Z

examples/covariance/plot_mahalanobis_distances.py

+#
+# First we generate a dataset of 125 samples and 2 features. Both features
+# are Gaussian distributed with mean of 0 but feature 1 has a standard
+# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples


Suggested change

# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples

# deviation equal to 2 and feature 2 has a standard deviation equal to 1. Next, 25 samples

glemaitre · 2020-05-18T16:11:02Z

examples/covariance/plot_mahalanobis_distances.py

+# are Gaussian distributed with mean of 0 but feature 1 has a standard
+# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
+# are replaced with Gaussian outlier samples where feature 1 has standard
+# devation = 1 and feature 2 has standard deviation = 7.


Suggested change

# devation = 1 and feature 2 has standard deviation = 7.

# deviation equal to 1 and feature 2 has a standard deviation equal_to 7.

glemaitre · 2020-05-18T16:11:18Z

examples/covariance/plot_mahalanobis_distances.py

+# First we generate a dataset of 125 samples and 2 features. Both features
+# are Gaussian distributed with mean of 0 but feature 1 has a standard
+# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
+# are replaced with Gaussian outlier samples where feature 1 has standard


Suggested change

# are replaced with Gaussian outlier samples where feature 1 has standard

# are replaced with Gaussian outlier samples where feature 1 has a standard

glemaitre · 2020-05-18T16:11:48Z

examples/covariance/plot_mahalanobis_distances.py

+# Comparison of results
+# ---------------------
+#
+# Below we fit MCD and MLE based covariance estimators to our data and print


Suggested change

# Below we fit MCD and MLE based covariance estimators to our data and print

# Below, we fit MCD and MLE based covariance estimators to our data and print

lucyleeow · 2020-05-19T13:29:31Z

Thanks @glemaitre, suggestions added.

glemaitre · 2020-05-20T12:49:13Z

Thanks @lucyleeow

…#17089)

lucyleeow added 6 commits April 29, 2020 21:25

use nb, clarify

4acd266

explain boxplots

2d66238

formatting, add comments

643f9bf

lint

0d80251

lint

49c5695

wording

ca01ace

lucyleeow changed the title ~~WIP DOC Update plot_mahalanobis_distances to notebook style~~ DOC Update plot_mahalanobis_distances to notebook style Apr 30, 2020

lucyleeow added 2 commits May 5, 2020 17:25

add see also

874ec5b

qa comment

1ea30bf

thomasjpfan reviewed May 8, 2020

View reviewed changes

lucyleeow added 2 commits May 9, 2020 15:14

add link, suggestions

fec1e00

formatting

a1deb9b

thomasjpfan approved these changes May 9, 2020

View reviewed changes

glemaitre reviewed May 18, 2020

View reviewed changes

suggestions

df2bfea

glemaitre merged commit b4db36d into scikit-learn:master May 20, 2020

lucyleeow deleted the plot_mahal_dist branch May 20, 2020 13:33

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

DOC Update plot_mahalanobis_distances to notebook style (scikit-learn…

bc02974

…#17089)

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

DOC Update plot_mahalanobis_distances to notebook style (scikit-learn…

72b00f3

…#17089)


		.. math::

		d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)

	d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)
	d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)^T\Sigma^{-1}(x_i - \mu)

	# First we generate a dataset of 125 samples and 2 features. Both features
	# First, we generate a dataset of 125 samples and 2 features. Both features

	# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
	# deviation equal to 2 and feature 2 has a standard deviation equal to 1. Next, 25 samples

	# devation = 1 and feature 2 has standard deviation = 7.
	# deviation equal to 1 and feature 2 has a standard deviation equal_to 7.

	# are replaced with Gaussian outlier samples where feature 1 has standard
	# are replaced with Gaussian outlier samples where feature 1 has a standard

	# Below we fit MCD and MLE based covariance estimators to our data and print
	# Below, we fit MCD and MLE based covariance estimators to our data and print

Uh oh!

Conversation

lucyleeow commented Apr 30, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented May 9, 2020

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented May 19, 2020

Uh oh!

glemaitre commented May 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants