Skip to content

DOC Update plot_mahalanobis_distances to notebook style#17089

Merged
glemaitre merged 11 commits intoscikit-learn:masterfrom
lucyleeow:plot_mahal_dist
May 20, 2020
Merged

DOC Update plot_mahalanobis_distances to notebook style#17089
glemaitre merged 11 commits intoscikit-learn:masterfrom
lucyleeow:plot_mahal_dist

Conversation

@lucyleeow
Copy link
Copy Markdown
Member

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

  • Update plot_mahalanobis_distances.py to notebook style with alternating code and text
  • set random seed for consistent results
  • expand explainations

Any other comments?

@lucyleeow lucyleeow changed the title WIP DOC Update plot_mahalanobis_distances to notebook style DOC Update plot_mahalanobis_distances to notebook style Apr 30, 2020
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lucyleeow

Since we broke up the example images into two images, there are two places in the user guide that will now have a different image: doc/modules/outlier_detection.rst (Fitting an elliptic envelope) and doc/modules/covariance.rst (Minimum Covariance Determinant). The images of the contour plot still seems to work in the context of the user guide. Do you think so as well?

plt.show()

# %%
# [1] P. J. Rousseeuw. Least median of squares regression. J. Am
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make these references link correctly? I feed they are better at the top of the page under the initial description. Specifically under the paragraph:

Associated applications include outlier detection, observation ranking and clustering.


.. math::

d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using T for transpose clearer for you?

Suggested change
d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)'\Sigma^{-1}(x_i - \mu)
d_{(\mu,\Sigma)}(x_i)^2 = (x_i - \mu)^T\Sigma^{-1}(x_i - \mu)

# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
# are replaced with Gaussian outlier samples where feature 1 has standard
# devation = 1 and feature 2 has standard deviation = 7.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the import numpy as np here?

# that of the MCD robust estimator (1.2). This shows that the MCD based
# robust estimator is much more resistant to the outlier samples, which were
# designed to have a much larger variance in feature 2.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the from sklearn.covariance import EmpiricalCovariance, MinCovDet and matplotlib import here?

@lucyleeow
Copy link
Copy Markdown
Member Author

Thanks for the review @thomasjpfan

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of nitpicks

# Generate data
# --------------
#
# First we generate a dataset of 125 samples and 2 features. Both features
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# First we generate a dataset of 125 samples and 2 features. Both features
# First, we generate a dataset of 125 samples and 2 features. Both features

#
# First we generate a dataset of 125 samples and 2 features. Both features
# are Gaussian distributed with mean of 0 but feature 1 has a standard
# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
# deviation equal to 2 and feature 2 has a standard deviation equal to 1. Next, 25 samples

# are Gaussian distributed with mean of 0 but feature 1 has a standard
# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
# are replaced with Gaussian outlier samples where feature 1 has standard
# devation = 1 and feature 2 has standard deviation = 7.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# devation = 1 and feature 2 has standard deviation = 7.
# deviation equal to 1 and feature 2 has a standard deviation equal_to 7.

# First we generate a dataset of 125 samples and 2 features. Both features
# are Gaussian distributed with mean of 0 but feature 1 has a standard
# deviation = 2 and feature 2 has a standard deviation = 1. Next, 25 samples
# are replaced with Gaussian outlier samples where feature 1 has standard
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# are replaced with Gaussian outlier samples where feature 1 has standard
# are replaced with Gaussian outlier samples where feature 1 has a standard

# Comparison of results
# ---------------------
#
# Below we fit MCD and MLE based covariance estimators to our data and print
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Below we fit MCD and MLE based covariance estimators to our data and print
# Below, we fit MCD and MLE based covariance estimators to our data and print

@lucyleeow
Copy link
Copy Markdown
Member Author

Thanks @glemaitre, suggestions added.

@glemaitre glemaitre merged commit b4db36d into scikit-learn:master May 20, 2020
@glemaitre
Copy link
Copy Markdown
Member

Thanks @lucyleeow

@lucyleeow lucyleeow deleted the plot_mahal_dist branch May 20, 2020 13:33
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants