[MRG] ENH Add example comparing the distribution of all scaling preprocessor by raghavrv · Pull Request #2 · glemaitre/scikit-learn

raghavrv · 2017-02-16T05:31:40Z

I have some initial plots comparing different scalers in the california housing dataset...
I binarized the targets to help visualize it better...

For quantile normalizer

@glemaitre @tguillemot @ogrisel @dengemann @jnothman

glemaitre · 2017-02-16T09:20:02Z

@raghavrv can you plot the robust scaler?

dengemann · 2017-02-16T09:59:38Z

That looks pretty cool! Indeed it seems to support the point made by @ogrisel about the potentially decorrelating nature of this non-linear transform.

ogrisel · 2017-02-16T18:19:36Z

I don't understand why the normalized data looks discretized on the y axis. Aren't we linearly interpolating points between quantiles?

EDIT by @raghavrv (This was for an old deleted plot)

glemaitre · 2017-02-16T18:20:59Z

@ogrisel Checking with @raghavrv IRL, it was a mistake.

raghavrv · 2017-02-16T18:25:35Z

@ogrisel Sorry I plotted the wrong features... ~~Here~~ Have updated at the PR description the plots for all the scalers / normalizers. You can see our QN performing great!

raghavrv · 2017-02-16T18:27:37Z

Note that I didn't binarize it this time and instead used matplotlib's colormap...

ogrisel · 2017-02-16T18:53:05Z

The plots looks great!

I find it weird that robust scaler is yielding such a flat profile. How many "outliers" are there on the y axis? I would have assumed that IQR scaling should have performed better.

ogrisel · 2017-02-16T18:54:52Z

Please add x and y labels on the first 2 plots (data without transform and after zoom on non-outliers).

raghavrv · 2017-02-16T19:00:17Z

How many "outliers" are there on the y axis? I would have assumed that IQR scaling should have performed better.

It scales the outliers by the IQR, so the outliers will still be outliers in the scaled data and hence the flat profile... We could zoom in like it is done here, but it would be unfair to the other scalers I think....

jnothman · 2017-02-16T20:46:27Z

Maybe best to just show 0-1 range after scaling.

…

On 17 Feb 2017 6:00 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote: How many "outliers" are there on the y axis? I would have assumed that IQR scaling should have performed better. It scales the outliers by the IQR, so the outliers will still be outliers in the scaled data and hence the flat profile... We could zoom in like it is done here <http://scikit-learn.org/stable/auto_examples/preprocessing/plot_robust_scaling.html#sphx-glr-auto-examples-preprocessing-plot-robust-scaling-py>, but it would be unfair to the other scalers I think.... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67GG3Di4cV0eFJZPrZjpGq6hd453ks5rdJzBgaJpZM4MCmli> .

jnothman · 2017-02-16T20:48:40Z

or show 0-1 and mark outliers along edges with x's

…

On 17 Feb 2017 7:46 am, "Joel Nothman" ***@***.***> wrote: Maybe best to just show 0-1 range after scaling. On 17 Feb 2017 6:00 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote: > How many "outliers" are there on the y axis? I would have assumed that > IQR scaling should have performed better. > > It scales the outliers by the IQR, so the outliers will still be outliers > in the scaled data and hence the flat profile... We could zoom in like it > is done here > <http://scikit-learn.org/stable/auto_examples/preprocessing/plot_robust_scaling.html#sphx-glr-auto-examples-preprocessing-plot-robust-scaling-py>, > but it would be unfair to the other scalers I think.... > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAEz67GG3Di4cV0eFJZPrZjpGq6hd453ks5rdJzBgaJpZM4MCmli> > . >

ogrisel · 2017-02-20T13:36:39Z

or show 0-1 and mark outliers along edges with x's

I am not sure how showing the outliers would work, but I like the idea of zooming on the [0.01-0.99] quantiles for all the transforms (in addition to the original, unzoomed plot for each transform).

…or all

raghavrv · 2017-02-20T18:55:35Z

@ogrisel @jnothman Okay this is the re-updated plot :)

ogrisel · 2017-02-20T21:32:38Z

Very nice!

ogrisel · 2017-02-20T21:33:29Z

Maybe you could remove the color bar from the first plots and only display it for the last plot as they all use the same color scale.

dengemann · 2017-02-20T21:35:20Z

Great work! It looks very helpful and clear.

tguillemot · 2017-02-21T09:33:03Z

Nice work @raghavrv !

jnothman · 2017-02-21T10:29:11Z