ENH Add option to scale output to unit var in RobustScaler by lucyleeow · Pull Request #17193 · scikit-learn/scikit-learn

lucyleeow · 2020-05-12T16:22:56Z

Reference Issues/PRs

Follows from PR #10140
closes #10139

What does this implement/fix? Explain your changes.

Adds gauss_adjust parameter to RobustScaler so transformed data is standard Gaussian when the input data was a Gaussian distribution.

Adds test to check that this option makes StandardScaler and RobustScaler equivalent on normal data as suggested by @amueller. Note these are only equivalent with a large dataset. Not sure if the parameters I used are optimal, open to suggestions.

Any other comments?

jnothman · 2020-05-13T01:26:43Z

In QuantileTransformer this is available as:

output_distribution : str, optional (default='uniform')
    Marginal distribution for the transformed data.
    The choices are ‘uniform’ (default) or ‘normal’.

jnothman · 2020-05-13T01:27:52Z

I suppose it's not the same. Here it's just a change in scale? Still, it would be nice if the options looked more similar to each other??

lucyleeow · 2020-05-13T08:28:31Z

I see your point but what would you suggest? This seems a bit clunky?

output_distribution : str, optional (default='non-standard normal')
    Scale distribution for the transformed data to standard normal or keep as non-standard normal.      
    Options 'non-standard normal' or ‘standard normal’

lucyleeow · 2020-05-13T09:09:05Z

Here it's just a change in scale?

Yes it is just a change in scale. Normally the standard deviation of the output distribution for 0.25,0.75 quartiles is:
std ~= 1/(norm.ppf(.75) - norm.ppf(.25)) = 0.741

The adjusted scale just changes the standard deviation to ~1 by changing the scale:

self.scale_  = (q[1] - q[0]) / self.adjust

where self.adjust is:

self.adjust = norm.ppf(q[1]  / 100.0) - norm.ppf(q[0] / 100.0)

lucyleeow · 2020-05-13T11:54:42Z

Maybe:

standard-normal :  boolean, False by default
    If True, scale data to a standard normal distribution.

which is vaguely more similar to QuantileTransformer

jnothman · 2020-05-13T12:20:46Z

Don't know what the right name is... scale_to_unit_variance???

sklearn/preprocessing/tests/test_data.py

amueller · 2020-05-13T19:01:21Z

Or just unit_variance? Maybe scale_to_unit_variance is more explicit?
Happy with either name.

Semi-related: should that become the default, i.e. should we deprecate it being False?

lucyleeow · 2020-05-13T19:45:48Z

Thanks for the reviews. I've used unit_variance but happy to change to scale_to_unit_variance

jnothman

otherwise LGTM

Please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

sklearn/preprocessing/_data.py

adrinjalali

Thanks @lucyleeow.

sklearn/preprocessing/_data.py

glemaitre

Some additional nitpicks

sklearn/preprocessing/_data.py

sklearn/preprocessing/tests/test_data.py

lucyleeow · 2020-05-20T08:37:59Z

ping @glemaitre. I also added more explanation of what the parameter does generally as suggested by @adrinjalali, but I'm not sure about the wording...

Could we also expand on this a bit and explain what happens if the feature is not normally distributed? Since this is the only place we're having the explanation of this parameter (no user guide), it'd be nice to elaborate a bit.

sklearn/preprocessing/tests/test_data.py

doc/whats_new/v0.24.rst

…10139

lucyleeow · 2020-05-28T14:14:12Z

Amended!

sklearn/preprocessing/_data.py

glemaitre · 2020-06-02T12:50:39Z

Thanks @lucyleeow merging

…arn#17193) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

lucyleeow added 2 commits May 12, 2020 18:12

gauss adj

d06c580

fix line

a4f28b8

github-actions bot added the module:preprocessing label May 12, 2020

amend robustscale aswell

6e6a17e

typo

af62efb

amueller reviewed May 13, 2020

View reviewed changes

sklearn/preprocessing/tests/test_data.py Outdated Show resolved Hide resolved

lucyleeow added 3 commits May 13, 2020 21:08

amend name

e259acd

use unit_variance

9866005

update name

d877ee9

amueller approved these changes May 13, 2020

View reviewed changes

lucyleeow changed the title ~~ENH Gaussian adjust option in RobustScaler~~ ENH Add option to scale output to unit var in RobustScaler May 13, 2020

jnothman reviewed May 14, 2020

View reviewed changes

sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved

jnothman approved these changes May 14, 2020

View reviewed changes

lucyleeow added 2 commits May 14, 2020 11:54

amend description

962f7d9

add whats new

a6e647a

adrinjalali reviewed May 14, 2020

View reviewed changes

sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved

sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved

sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved

lucyleeow added 3 commits May 14, 2020 13:25

add version added, simplify stats

21ecc6d

formatting

3ccaccf

formatting

4a8b6a9

glemaitre reviewed May 19, 2020

View reviewed changes

lucyleeow added 2 commits May 19, 2020 14:34

suggestions, explain

fc7fa0b

explanation

76a4f31

lucyleeow added 3 commits May 19, 2020 14:36

lint

5fb2c5c

add import back

918de7f

fix rtol

e7a38fe

lucyleeow and others added 2 commits May 20, 2020 12:05

merge master

4455f72

Merge branch 'master' into IS/10139

3a2cd02

glemaitre reviewed May 26, 2020

View reviewed changes

sklearn/preprocessing/tests/test_data.py Outdated Show resolved Hide resolved

glemaitre self-assigned this May 27, 2020

glemaitre reviewed May 27, 2020

View reviewed changes

sklearn/preprocessing/tests/test_data.py Outdated Show resolved Hide resolved

amend test

172178f

glemaitre removed their assignment May 27, 2020

Merge branch 'master' into IS/10139

0cd39f2

glemaitre reviewed May 28, 2020

View reviewed changes

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

lucyleeow added 3 commits May 28, 2020 14:06

Merge branch 'IS/10139' of github.com:lucyleeow/scikit-learn into IS/…

955367b

…10139

suggestion

1acd376

comma

ac09f84

thomasjpfan reviewed Jun 1, 2020

View reviewed changes

sklearn/preprocessing/_data.py Outdated Show resolved Hide resolved

lucyleeow added 2 commits June 2, 2020 10:48

remove adjust as att

589d499

lint, remove gb files

a130393

glemaitre merged commit 863e58f into scikit-learn:master Jun 2, 2020

lucyleeow deleted the IS/10139 branch June 2, 2020 12:52

cmarmo mentioned this pull request Jun 2, 2020

Fix Feature Request: Gaussian adjust option in RobustScaler #10139 #10140

Closed

viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020

ENH Add option to scale output to unit var in RobustScaler (scikit-le…

3cffe5e

…arn#17193) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

ENH Add option to scale output to unit var in RobustScaler (scikit-le…

6082843

…arn#17193) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Uh oh!

Conversation

lucyleeow commented May 12, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented May 13, 2020

Uh oh!

jnothman commented May 13, 2020

Uh oh!

lucyleeow commented May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucyleeow commented May 13, 2020

Uh oh!

jnothman commented May 13, 2020

Uh oh!

Uh oh!

amueller commented May 13, 2020

Uh oh!

lucyleeow commented May 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucyleeow commented May 20, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucyleeow commented May 28, 2020

Uh oh!

Uh oh!

glemaitre commented Jun 2, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lucyleeow commented May 13, 2020 •

edited

Loading

lucyleeow commented May 13, 2020 •

edited

Loading

lucyleeow commented May 13, 2020 •

edited

Loading