[MRG] DOC Mention StandardScaler ddof#12950
Merged
qinhanmin2014 merged 7 commits intoscikit-learn:masterfrom Jan 14, 2019
Merged
[MRG] DOC Mention StandardScaler ddof#12950qinhanmin2014 merged 7 commits intoscikit-learn:masterfrom
qinhanmin2014 merged 7 commits intoscikit-learn:masterfrom
Conversation
added 3 commits
January 10, 2019 16:14
…he estimator of the standard deviation is the biased one
jnothman
reviewed
Jan 11, 2019
Member
jnothman
left a comment
There was a problem hiding this comment.
Put it in a Notes section and explain that the choice of ddof is unlikely to affect ML performance
Contributor
Author
|
What do you mean by 'Put it in a Notes section'? I've searched 'notes' in the contributing guidelines, but all I can find is a section about 'working notes' |
…d affect model performance.
Member
qinhanmin2014
left a comment
There was a problem hiding this comment.
See https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.StratifiedKFold.html for notes section.
I agree that we should emphasize the difference in notes section, since we're not going to modify our implementation.
sklearn/preprocessing/data.py
Outdated
| and `s` is the standard deviation of the training samples or one if | ||
| `with_std=False`. | ||
| `with_std=False`. Note that `s` is a biased estimator of the standard | ||
| deviation, equivalent to numpy.sqrt(numpy.var(x, ddof=0)), and that it is |
….std instead of np.var
qinhanmin2014
approved these changes
Jan 11, 2019
sklearn/preprocessing/data.py
Outdated
| transform. | ||
|
|
||
| We use a biased estimator for the standard deviation, equivalent to | ||
| `numpy.std(x, ddof=0)`. Note, however, that the choice of `ddof` is |
qinhanmin2014
approved these changes
Jan 11, 2019
jnothman
approved these changes
Jan 13, 2019
jnothman
reviewed
Jan 13, 2019
Member
jnothman
left a comment
There was a problem hiding this comment.
Actually, before merge, can we get this note replicated in the scale function?
qinhanmin2014
approved these changes
Jan 14, 2019
jnothman
pushed a commit
to jnothman/scikit-learn
that referenced
this pull request
Feb 19, 2019
xhluca
pushed a commit
to xhluca/scikit-learn
that referenced
this pull request
Apr 28, 2019
xhluca
pushed a commit
to xhluca/scikit-learn
that referenced
this pull request
Apr 28, 2019
This reverts commit 6a85a17.
xhluca
pushed a commit
to xhluca/scikit-learn
that referenced
this pull request
Apr 28, 2019
This reverts commit 6a85a17.
koenvandevelde
pushed a commit
to koenvandevelde/scikit-learn
that referenced
this pull request
Jul 12, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #7757
What does this implement/fix? Explain your changes.
Expands the documentation so it's clear that the estimate of the standard deviation in StandardScaler is the biased one (equivalent to numpy.sqrt(numpy.var(x, ddof=0))).
Any other comments?