[MRG+1] fused types in isotonic_regression by vene · Pull Request #9106 · scikit-learn/scikit-learn

vene · 2017-06-11T07:58:30Z

Reference Issue #8769 -ish?

What does this implement/fix? Explain your changes.

isotonic regression should preserve 32bit dtypes when possible.

Any other comments?

vene · 2017-06-11T10:32:41Z

I implemented and tested dtype preservation for IsotonicRegression as well.

I followed the convention that X should drive the dtype, and checked that interp1d preserves dtype. For this particular estimator, though, this doesn't make much sense: X and y are always just 1d arrays. In any case, it's better to stick to conventions.

Ping @Henley13 for comments :)

vene · 2017-06-11T10:34:59Z

further ping @lesteve, @glemaitre, @raghavrv, @TomDLT just in case someone bites :P I am pretty sure I forgot things here, because I failed to participate to all of the discussions on 32bit support over the week.

vene · 2017-06-11T10:36:51Z

sklearn/tests/test_isotonic.py

    copy.copy(ir)
+
+
+def test_isotonic_dtype():


should this kind of test be here or in common tests / estimator tags?

Probably there, but not possible without estimator tags, right? preserves_precision? Have you run it on all_estimators to see the status? We could post an issue with a check list.

vene · 2017-06-13T12:35:51Z

test failures were genuine: one of the tests is calling predict without calling fit. Instead of changing the test I made the model support this behavior (so it works the same way as on master)

amueller · 2017-06-19T00:46:07Z

sklearn/isotonic.py

            The transformed data
        """
        T = as_float_array(T)
+        if hasattr(self, '_dtype'):


Is this required? Is this faster?

Basically a way to check if there is a saved X. I could replace it with

if hasattr(self, '_necessary_X_'): T = T.astype(self._necessary_X_.dtype, copy=False)

I could check if it makes a difference in terms of speed but it's probably minimal (just attribute accesses, right?)

TomDLT · 2017-06-26T13:13:18Z

sklearn/isotonic.py

+        sample_weight = np.ones(len(y), dtype=y.dtype)
    else:
-        sample_weight = np.array(sample_weight[order], dtype=np.float64)
+        sample_weight = np.array(sample_weight[order])


should we make sure that sample_weight has the same dtype than y?

TomDLT · 2017-06-26T13:21:18Z

sklearn/isotonic.py

        check_consistent_length(X, y, sample_weight)
        X, y = [check_array(x, ensure_2d=False) for x in [X, y]]

+        X = as_float_array(X)


This will trigger a copy, and is rather redundant with check_array.
I would prefer:

X = check_array(X, dtype=[np.float64, np.float32]) y = check_array(y, dtype=X.dtype, ensure_2d=False)

@TomDLT this has different behavior than as_float_array is the input is int32 (and maybe others too?)

import numpy as np from sklearn.utils import as_float_array, check_array for dtype in (np.int32, np.int64, np.uint32, np.uint64, np.float32, np.float64): print('input dtype\t', dtype.__name__) x = np.arange(5).astype(dtype) x = as_float_array(x) print('as_float_array\t', x.dtype) x = np.arange(5).astype(dtype) x = check_array(x, dtype=[np.float64, np.float32], ensure_2d=False) print('check_array\t', x.dtype) print()

outputs

input dtype int32 as_float_array float32 check_array float64 input dtype int64 as_float_array float64 check_array float64 input dtype uint32 as_float_array float64 check_array float64 input dtype uint64 as_float_array float64 check_array float64 input dtype float32 as_float_array float32 check_array float32 input dtype float64 as_float_array float64 check_array float64

Has there been a decision in the other 32-bit prs what to do in this situation?

in logistic regression it seems like the check_array way is used:

https://github.com/Henley13/scikit-learn/blob/0a5eda04b90c5b89c14f870bfde45ed71f89311c/sklearn/linear_model/sag.py#L239

this has different behavior than as_float_array is the input is int32

Fair enough, just make sure that it does not copy more than necessary

I removed the redundant call to check_array on X and am now just calling it on y. Still, I think the behavior on int inputs should be consistent if possible over the codebase.

@Henley13 have you already discussed this?

Failing tests on travis seem to suggest that my assumptions here fail on some configs; in particular this:

input dtype int32 as_float_array float32 check_array float64

vene · 2019-02-25T10:05:42Z

thanks @agramfort ! my memory might fail but let me know if I can help

agramfort · 2019-02-25T11:05:04Z

@vene can you replicate the travis failure? It seems I can't

vene · 2019-02-25T12:16:12Z

@vene can you replicate the travis failure? It seems I can't

I was able to reproduce in a clean python 3.5.6 installed with pyenv, with numpy==1.11.0 scipy==0.17.0

The problem is caused by the call to interp1d, which apparently upcasts the float32s to float64 on this version of scipy (?).

Basically, at this line, T.dtype == float32 but the result is float64...

vene · 2019-02-25T12:22:47Z

# file check.py
import numpy as np
import scipy
from scipy import interpolate
print(np.__version__)
print(scipy.__version__)

X = np.random.randn(5).astype(np.float32)
y = np.random.randn(5).astype(np.float32)
T = np.random.randn(5).astype(np.float32)

f_ = scipy.interpolate.interp1d(X, y, kind='linear', bounds_error=False)
out = f_(T)
print(out.dtype)

output

scikit-learn(isofused){main}$ python check.py
1.16.1
1.2.1
float32
scikit-learn(isofused){main}$ pyenv activate skl
scikit-learn(isofused){skl}$ python check.py
1.11.0
0.17.0
float64

vene · 2019-02-25T12:27:20Z

I think there are two solutions

force cast the return of f_ to the dtype of T, which on newer scipy should be a noop
change our expectations in the tests (at the cost of weakening the tests, I think)

EDIT: pushed a quick fix for solution (1)

sorry, i'm rusty

agramfort · 2019-02-25T14:17:10Z

this is good to go from my end. Should I be worried that azure and appveyor failed? it seems unrelated...

vene · 2019-02-25T14:43:15Z

~~one of the~~ both of the azure failures actually seems related, but pretty weird:

================================== FAILURES ===================================
_______ test_isotonic_regression_with_ties_in_differently_sized_groups ________

    def test_isotonic_regression_with_ties_in_differently_sized_groups():
        """
        Non-regression test to handle issue 9432:
        ***/issues/9432
    
        Compare against output in R:
        > library("isotone")
        > x <- c(0, 1, 1, 2, 3, 4)
        > y <- c(0, 0, 1, 0, 0, 1)
        > res1 <- gpava(x, y, ties="secondary")
        > res1$x
    
        `isotone` version: 1.1-0, 2015-07-24
        R version: R version 3.3.2 (2016-10-31)
        """
        x = np.array([0, 1, 1, 2, 3, 4])
        y = np.array([0, 0, 1, 0, 0, 1])
        y_true = np.array([0., 0.25, 0.25, 0.25, 0.25, 1.])
        ir = IsotonicRegression()
>       ir.fit(x, y)

ir         = IsotonicRegression(increasing=True, out_of_bounds='nan', y_max=None, y_min=None)
x          = array([0, 1, 1, 2, 3, 4])
y          = array([0, 0, 1, 0, 0, 1])
y_true     = array([0.  , 0.25, 0.25, 0.25, 0.25, 1.  ])

C:\Miniconda\envs\testvenv\lib\site-packages\sklearn\tests\test_isotonic.py:190: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
C:\Miniconda\envs\testvenv\lib\site-packages\sklearn\isotonic.py:323: in fit
    X, y = self._build_y(X, y, sample_weight)
C:\Miniconda\envs\testvenv\lib\site-packages\sklearn\isotonic.py:267: in _build_y
    X, y, sample_weight)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: Buffer dtype mismatch, expected 'float' but got 'double'

__builtins__ = <builtins>
__doc__    = None
__file__   = 'C:\\Miniconda\\envs\\testvenv\\lib\\site-packages\\sklearn\\_isotonic.cp35-win_amd64.pyd'
__loader__ = <_frozen_importlib_external.ExtensionFileLoader object at 0x000002DE52881B70>
__name__   = 'sklearn._isotonic'
__package__ = 'sklearn'
__pyx_unpickle_Enum = <built-in function __pyx_unpickle_Enum>
__spec__   = ModuleSpec(name='sklearn._isotonic', loader=<_frozen_importlib_external.ExtensionFileLoader object at 0x000002DE52881B70>, origin='C:\\Miniconda\\envs\\testvenv\\lib\\site-packages\\sklearn\\_isotonic.cp35-win_amd64.pyd')
__test__   = {}
_inplace_contiguous_isotonic_regression = <cyfunction _inplace_contiguous_isotonic_regression at 0x000002DE52819D48>
_make_unique = <cyfunction _make_unique at 0x000002DE5288B048>
np         = <module 'numpy' from 'C:\\Miniconda\\envs\\testvenv\\lib\\site-packages\\numpy\\__init__.py'>

sklearn\_isotonic.pyx:84: ValueError

vene · 2019-02-25T14:57:24Z

The failure is when calling _make_unique, which has signature

def _make_unique(np.ndarray[dtype=floating] X,
                 np.ndarray[dtype=floating] y,
                 np.ndarray[dtype=floating] sample_weights):

Could this be because the fused type specializations are tied, i.e., either X, y, sample_weights are all float32 or all float64 but not mixed?

agramfort · 2019-02-25T17:09:08Z

yes it's very likely. Can you fix?

vene · 2019-02-25T17:11:27Z

Hopefully! I'm trying to reproduce on windows, but it might take a bit longer for technical reasons.

vene · 2019-02-25T21:09:26Z

I was able to reproduce. Will find a fix tomorrow.

albertcthomas · 2019-02-25T21:38:52Z

FWIW I can also reproduce on my laptop.

vene · 2019-02-25T23:14:32Z

Thanks @albertcthomas. It was a legitimate bug; per new failing test in previous commit 6f2b6d1. Should be fixed now, let's see what ci says.

agramfort · 2019-02-26T14:17:20Z

thanks @vene

vene · 2019-02-26T14:23:21Z

thanks @agramfort !!

* ENH fused types in isotonic_regression * make X drive computation dtype in IsotonicRegression * preserve current behaviour if transform w/o fit * thoroughly check sample weights; avoid storing dtype explicitly * consistent testing and behavior * misc * update what's new * fix for interp1d upcast on old scipy * flake8 remove blank line sorry, i'm rusty * add failing test * FIX dtype bug in _make_unique

This reverts commit a29db54.

* ENH fused types in isotonic_regression * make X drive computation dtype in IsotonicRegression * preserve current behaviour if transform w/o fit * thoroughly check sample weights; avoid storing dtype explicitly * consistent testing and behavior * misc * update what's new * fix for interp1d upcast on old scipy * flake8 remove blank line sorry, i'm rusty * add failing test * FIX dtype bug in _make_unique

massich mentioned this pull request Jun 11, 2017

LogisticRegression convert to float64 #8769

Closed

vene changed the title ~~[WIP] fused types in isotonic_regression~~ [MRG] fused types in isotonic_regression Jun 11, 2017

vene commented Jun 11, 2017

View reviewed changes

jnothman added this to the 0.19 milestone Jun 14, 2017

amueller reviewed Jun 19, 2017

View reviewed changes

TomDLT reviewed Jun 26, 2017

View reviewed changes

vene and others added 6 commits February 25, 2019 10:35

ENH fused types in isotonic_regression

d92f266

make X drive computation dtype in IsotonicRegression

fe964da

preserve current behaviour if transform w/o fit

041af30

thoroughly check sample weights; avoid storing dtype explicitly

d9bf90a

consistent testing and behavior

cb1b2a7

misc

241c3c6

agramfort force-pushed the isofused branch from 63a1f7d to 241c3c6 Compare February 25, 2019 09:58

update what's new

c221526

vene added 2 commits February 25, 2019 12:32

fix for interp1d upcast on old scipy

a433e9b

flake8 remove blank line

772a423

sorry, i'm rusty

vene added 2 commits February 25, 2019 22:51

add failing test

6f2b6d1

FIX dtype bug in _make_unique

4447b84

agramfort approved these changes Feb 26, 2019

View reviewed changes

agramfort changed the title ~~[MRG] fused types in isotonic_regression~~ [MRG+1] fused types in isotonic_regression Feb 26, 2019

agramfort merged commit 4e56b29 into scikit-learn:master Feb 26, 2019

agramfort mentioned this pull request Feb 27, 2019

FIX : make sure int32 gets float64 in isotonic regression #13300

Merged

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+1] fused types in isotonic_regression (scikit-learn#9106)"

62ce4ee

This reverts commit a29db54.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+1] fused types in isotonic_regression (scikit-learn#9106)"

f5de2e5

This reverts commit a29db54.

Uh oh!

Conversation

vene commented Jun 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue #8769 -ish?

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

vene commented Jun 11, 2017

Uh oh!

vene commented Jun 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vene commented Jun 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vene Jun 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomDLT Jul 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vene commented Feb 25, 2019

Uh oh!

agramfort commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vene commented Feb 25, 2019

Uh oh!

agramfort commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019

Uh oh!

albertcthomas commented Feb 25, 2019

Uh oh!

vene commented Feb 25, 2019

Uh oh!

agramfort commented Feb 26, 2019

Uh oh!

vene commented Feb 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vene commented Jun 11, 2017 •

edited

Loading

vene Jun 26, 2017 •

edited

Loading

TomDLT Jul 3, 2017 •

edited

Loading

vene commented Feb 25, 2019 •

edited

Loading

vene commented Feb 25, 2019 •

edited

Loading