[MRG+1] _preprocess_data consistent with fused types by Henley13 · Pull Request #9093 · scikit-learn/scikit-learn

Henley13 · 2017-06-09T17:26:03Z

Reference Issue

Works on #8769

What does this implement/fix? Explain your changes.

Prevent _preprocess_data from casting float32 data into float64.

Any other comments?

Intermediate step for PR #9087

GaelVaroquaux · 2017-06-09T17:41:41Z

LGTM. +1 for merge

jmargeta · 2017-06-09T22:56:36Z

sklearn/linear_model/base.py

+            if X.dtype == np.float32:
+                y_offset = np.float32(0)
+            else:
+                y_offset = np.float64(0)


What about replacing this block with just
y_offset = X.dtype.type(0) ?
Tested the dtype.type method with numpy 1.8.2 and 1.12.1
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.type.html

It's exactly the function I was looking for, thank you!

…descent.py

Henley13 · 2017-06-10T14:40:24Z

@GaelVaroquaux I changed some lines of code

jnothman

I've not checked whether this preprocessing is not used in some linear models, e.g. SGD, and whether that explains their absence from the changes.

Apart from that and the wording, this LGTM

jnothman · 2017-06-11T07:57:22Z

sklearn/linear_model/coordinate_descent.py


        y : ndarray, shape (n_samples,) or (n_samples, n_targets)
-            Target
+            Target. If it's not the case, y is cast in X.dtype further


I'd rather this phrased as "Will be cast to X's dtype."

glemaitre · 2017-06-11T08:22:38Z

sklearn/linear_model/tests/test_base.py

+        for normalize in [True, False]:
+
+            Xt_32, yt_32, X_mean_32, y_mean_32, X_norm_32 = \
+                _preprocess_data(X_32, y_32, fit_intercept=fit_intercept,


could you avoid to use the backslash?

MechCoder

Looks fine, just some minor comments.

MechCoder · 2017-06-13T10:35:14Z

sklearn/linear_model/base.py


        y : numpy array of shape [n_samples, n_targets]
-            Target values
+            Target values. If it's not the case, y is cast in X.dtype further


Umm sorry, what does "it's" in "if it's not the case" refer to?

MechCoder · 2017-06-13T10:57:02Z

sklearn/linear_model/tests/test_coordinate_descent.py

    # Check that no error is raised if data is provided in the right format
    clf.fit(X, y, check_input=False)
-    X = check_array(X, order='F', dtype='float32')
-    clf.fit(X, y, check_input=True)


Why did you remove these two lines?

Because it was used for the test below (assert_raises(ValueError, clf.fit, X, y, check_input=False)), casting X in 32 bits. But now, _preprocess_data prevent fit from raising a ValueError, even if check_input=False. Since you suggested a smoke test, I can put it back.

MechCoder · 2017-06-13T10:59:58Z

sklearn/linear_model/tests/test_coordinate_descent.py

-    clf.fit(X, y, check_input=True)
-    # Check that an error is raised if data is provided in the wrong dtype,
-    # because of check bypassing
-    assert_raises(ValueError, clf.fit, X, y, check_input=False)


I would suggest to change this to a smoke test:

clf.fit(X, y, check_input=False)

and add a comment saying because check_input=False, an exhaustive check is not made on y but just the dtype of y is cast in _preprocess_data to the dtype of X so this passes. (We will definitely forget in the future)

MechCoder · 2017-06-13T11:02:28Z

sklearn/linear_model/tests/test_base.py

+            assert_equal(y_mean_6432.dtype, np.float64)
+            assert_equal(X_norm_6432.dtype, np.float64)
+
+            assert_array_almost_equal(Xt_32, Xt_64)


copy is set to be True by default. Hence can you also check that the dtype of the initial array does not change?

I just did, few lines below!

But doing assert_array_equal(X_32, X_32_initial) I don't know if the dtype is properly tested...

amueller · 2017-06-19T00:17:50Z

has conflicts

amueller · 2017-06-19T00:18:30Z

@MechCoder I mistook your avatar for a fidget spinner and now I can't unsee it.

GaelVaroquaux · 2017-06-19T12:50:00Z

@Henley13 : can you resolve the merge commits, please

MechCoder · 2017-06-19T16:43:08Z

@amueller I googled what a fidget spinner is and now I have to change my avatar :-|

…to preprocess_data

MechCoder · 2017-06-23T13:14:50Z

Can you just change the "If it's not the case" everywhere and I'll be happy to merge.

Henley13 · 2017-06-23T13:50:51Z

@MechCoder Sorry, I thought I did it. Should be ok now.

MechCoder · 2017-06-23T15:03:22Z

thanks @Henley13 1

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True #2 * fix doc * fix doc #2 * fix doc #3

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3

Imbert Arthur added 2 commits June 9, 2017 19:14

add test for _preprocess_data and make it consistent

2d32a78

fix pep8

d518100

GaelVaroquaux changed the title ~~[MRG] _preprocess_data consistent with fused types~~ [MRG+1] _preprocess_data consistent with fused types Jun 9, 2017

jmargeta reviewed Jun 9, 2017

View reviewed changes

add doc, cast systematically y in X.dtype and update test_coordinate_…

0f96f71

…descent.py

jnothman approved these changes Jun 11, 2017

View reviewed changes

glemaitre reviewed Jun 11, 2017

View reviewed changes

MechCoder reviewed Jun 13, 2017

View reviewed changes

jnothman added this to the 0.19 milestone Jun 18, 2017

Imbert Arthur and others added 4 commits June 23, 2017 14:30

test if input values don't change with copy=True

765a5fe

Merge branch 'master' into preprocess_data

7d390d2

test if input values don't change with copy=True #2

82f8c2c

Merge branch 'preprocess_data' of github.com:Henley13/scikit-learn in…

57822b8

…to preprocess_data

Imbert Arthur added 3 commits June 23, 2017 15:44

fix doc

f9ef4a3

fix doc #2

0517dbc

fix doc scikit-learn#3

10bc143

MechCoder approved these changes Jun 23, 2017

View reviewed changes

MechCoder merged commit 89962f0 into scikit-learn:master Jun 23, 2017

Uh oh!

Conversation

Henley13 commented Jun 9, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

GaelVaroquaux commented Jun 9, 2017

Uh oh!

jmargeta Jun 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Henley13 commented Jun 10, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Jun 19, 2017

Uh oh!

amueller commented Jun 19, 2017

Uh oh!

GaelVaroquaux commented Jun 19, 2017

Uh oh!

MechCoder commented Jun 19, 2017

Uh oh!

MechCoder commented Jun 23, 2017

Uh oh!

Henley13 commented Jun 23, 2017

Uh oh!

MechCoder commented Jun 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jmargeta Jun 9, 2017 •

edited

Loading