[MRG+1] _preprocess_data consistent with fused types#9093
[MRG+1] _preprocess_data consistent with fused types#9093MechCoder merged 10 commits intoscikit-learn:masterfrom
Conversation
|
LGTM. +1 for merge |
sklearn/linear_model/base.py
Outdated
| if X.dtype == np.float32: | ||
| y_offset = np.float32(0) | ||
| else: | ||
| y_offset = np.float64(0) |
There was a problem hiding this comment.
What about replacing this block with just
y_offset = X.dtype.type(0) ?
Tested the dtype.type method with numpy 1.8.2 and 1.12.1
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.type.html
There was a problem hiding this comment.
It's exactly the function I was looking for, thank you!
|
@GaelVaroquaux I changed some lines of code |
jnothman
left a comment
There was a problem hiding this comment.
I've not checked whether this preprocessing is not used in some linear models, e.g. SGD, and whether that explains their absence from the changes.
Apart from that and the wording, this LGTM
|
|
||
| y : ndarray, shape (n_samples,) or (n_samples, n_targets) | ||
| Target | ||
| Target. If it's not the case, y is cast in X.dtype further |
There was a problem hiding this comment.
I'd rather this phrased as "Will be cast to X's dtype."
| for normalize in [True, False]: | ||
|
|
||
| Xt_32, yt_32, X_mean_32, y_mean_32, X_norm_32 = \ | ||
| _preprocess_data(X_32, y_32, fit_intercept=fit_intercept, |
There was a problem hiding this comment.
could you avoid to use the backslash?
MechCoder
left a comment
There was a problem hiding this comment.
Looks fine, just some minor comments.
sklearn/linear_model/base.py
Outdated
|
|
||
| y : numpy array of shape [n_samples, n_targets] | ||
| Target values | ||
| Target values. If it's not the case, y is cast in X.dtype further |
There was a problem hiding this comment.
Umm sorry, what does "it's" in "if it's not the case" refer to?
| # Check that no error is raised if data is provided in the right format | ||
| clf.fit(X, y, check_input=False) | ||
| X = check_array(X, order='F', dtype='float32') | ||
| clf.fit(X, y, check_input=True) |
There was a problem hiding this comment.
Why did you remove these two lines?
There was a problem hiding this comment.
Because it was used for the test below (assert_raises(ValueError, clf.fit, X, y, check_input=False)), casting X in 32 bits. But now, _preprocess_data prevent fit from raising a ValueError, even if check_input=False. Since you suggested a smoke test, I can put it back.
| clf.fit(X, y, check_input=True) | ||
| # Check that an error is raised if data is provided in the wrong dtype, | ||
| # because of check bypassing | ||
| assert_raises(ValueError, clf.fit, X, y, check_input=False) |
There was a problem hiding this comment.
I would suggest to change this to a smoke test:
clf.fit(X, y, check_input=False)
and add a comment saying because check_input=False, an exhaustive check is not made on y but just the dtype of y is cast in _preprocess_data to the dtype of X so this passes. (We will definitely forget in the future)
| assert_equal(y_mean_6432.dtype, np.float64) | ||
| assert_equal(X_norm_6432.dtype, np.float64) | ||
|
|
||
| assert_array_almost_equal(Xt_32, Xt_64) |
There was a problem hiding this comment.
copy is set to be True by default. Hence can you also check that the dtype of the initial array does not change?
There was a problem hiding this comment.
I just did, few lines below!
There was a problem hiding this comment.
But doing assert_array_equal(X_32, X_32_initial) I don't know if the dtype is properly tested...
|
has conflicts |
|
@MechCoder I mistook your avatar for a fidget spinner and now I can't unsee it. |
|
@Henley13 : can you resolve the merge commits, please |
|
@amueller I googled what a fidget spinner is and now I have to change my avatar :-| |
|
Can you just change the "If it's not the case" everywhere and I'll be happy to merge. |
|
@MechCoder Sorry, I thought I did it. Should be ok now. |
|
thanks @Henley13 1 |
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True scikit-learn#2 * fix doc * fix doc scikit-learn#2 * fix doc scikit-learn#3
Reference Issue
Works on #8769
What does this implement/fix? Explain your changes.
Prevent _preprocess_data from casting float32 data into float64.
Any other comments?
Intermediate step for PR #9087