[MRG + 1] FIX use high precision cumsum and check it is stable enough by jnothman · Pull Request #7331 · scikit-learn/scikit-learn

jnothman · 2016-09-02T05:50:42Z

Fixes #6842. I don't know a test that will run quickly enough.

TomDLT · 2016-09-02T08:36:34Z

~~Wow even with float64 there are many errors ...~~
Small mistake in the check.

TomDLT · 2016-09-02T08:41:43Z

sklearn/utils/extmath.py

+    """
+    out = np.cumsum(arr, dtype=np.float64)
+    expected = np.sum(arr, dtype=np.float64)
+    if not np.allclose(out, expected):


if not np.allclose(out[-1], expected):

I thought I committed that change :p

jnothman · 2016-09-03T11:57:49Z

Sorry for opening a PR and running away :)

yenchenlin · 2016-09-03T13:10:17Z

LGTM

tguillemot · 2016-09-05T08:22:48Z

Indeed that could be problematic. LGTM

TomDLT · 2016-09-05T08:46:58Z

LGTM

jnothman · 2016-09-05T12:12:19Z

Any ideas of pathological cases to test with quickly?

TomDLT · 2016-09-06T10:05:19Z

No idea, except comparing roc_auc_score(y_true, y_pred, sample_weight=sample_weight_32) and roc_auc_score(y_true, y_pred, sample_weight=sample_weight_64), yet it does not test the bug we want to fix, but the correction we propose, so I am not fond of this solution.

jnothman · 2016-09-06T11:46:48Z

I only think it's an effective test for very large vectors, hence
impractical.

On 6 September 2016 at 20:05, Tom Dupré la Tour notifications@github.com
wrote:

No idea, except comparing roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_32) and roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_64), yet it does not test the bug we want to
fix, but the correction we propose, so I am not fond of this solution.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7331 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6zdyrhBjU7xPTazZ20BVeBDqeRrgks5qnTrggaJpZM4JzY5A
.

amueller · 2016-09-06T16:22:09Z

how long does a test with a large-enough vector run?

amueller · 2016-09-06T16:22:30Z

lgtm

jnothman · 2016-09-07T03:14:51Z

import sys
import time
import numpy as np
import pandas as pd
n_trials = 50
all_results = []
for dtype in [np.float32, np.float64]:
    for i in range(3, 8):
        n = 1 * 10 ** i
        absdiff, reldiff = [], []
        s = time.time()
        for j in range(n_trials):
            x = np.random.rand(n).astype(dtype)
            a = np.cumsum(x)[-1]
            b = np.sum(x)
            absdiff.append(np.abs(a - b))
            reldiff.append(absdiff[-1] / b)
        all_results.append((dtype.__name__, n, (time.time() - s) / len(results), np.log10(np.mean(absdiff)), np.log10(np.mean(reldiff))))

pd.DataFrame(all_results, columns=['dtype', 'n', 'time', 'log abs diff', 'log rel diff'])

dtype	n	time	log abs diff	log rel diff
float32	1000	0.00	-3.84	-6.54
float32	10000	0.00	-2.23	-5.93
float32	100000	0.00	-0.64	-5.34
float32	`1000000`	0.02	0.71	-4.99
float32	10000000	0.22	2.28	-4.41
float64	1000	0.00	-12.59	-15.28
float64	10000	0.00	-10.96	-14.66
float64	100000	0.00	-9.47	-14.16
float64	`1000000`	0.03	-7.96	-13.66
float64	10000000	0.23	-6.40	-13.09

np.allclose has default rtol=1e-5, atol=1e-8

Two questions:

can we build a non-regression case where the old code would have been broken? possibly: all the absolute diffs above for float32 exceed the atol; for 1e6 samples we get relative diff just greater than allclose's rtol in 0.02s, but this would be slower embedded in the rest of roc_auc_score.
can we get test coverage for the error message here? probably not, without having rtol as a parameter to stable_cumsum. This might be worth doing.

So I'll try build a non-regression test with 1e6 samples.

jnothman · 2016-09-08T04:03:15Z

A regression test for roc_auc_score turns out to be too slow.

jnothman · 2016-09-08T04:03:47Z

I've added a direct test of stable_cumsum using configurable atol, rtol

jnothman · 2016-09-08T04:14:33Z

I suppose the remaining question is: are there other places in the codebase where cumsum is happening and might be over float32 data or very very large arrays?

lesteve · 2016-09-08T13:19:05Z

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

======================================================================
FAIL: sklearn.utils.tests.test_extmath.test_stable_cumsum
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/tests/test_extmath.py", line 656, in test_stable_cumsum
    stable_cumsum, r, rtol=0, atol=0)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/testing.py", line 438, in assert_raise_message
    (names, function.__name__))
AssertionError: RuntimeError not raised by stable_cumsum

jnothman · 2016-09-08T13:33:54Z

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

Yeah, I was wondering if there'd be some platform that didn't fail my test....

jnothman · 2016-09-08T13:35:05Z

I assume that means numpy used to have an unstable implementation of sum, rather than it used to have a stable implementation of cumsum

jnothman · 2016-09-08T13:35:54Z

I wonder: Should I only run the test on recent numpy, or just remove it?

amueller · 2016-09-08T14:16:58Z

Hm... testing the error message seems slightly odd since it relies on numpy being broken. I guess having a test with a known correct result that wasn't correct before might be nicer, but not offer the error coverage?

Like

x = np.empty(1e7)
x.fill(1e-7)
np.isclose(stable_cumsum(x)[-1], 1, rtol=0, atol=1e-12)

jnothman · 2016-09-08T14:22:32Z

It might just be my bedtime, but I'm not sure I get what test you're suggesting.

amueller · 2016-09-08T14:23:33Z

np.isclose(cumsum(x)[-1], 1, rtol=0, atol=1e-12)
fails for np.cumsum currently, but should work for stable_cumsum

jnothman · 2016-09-08T14:25:56Z

No, I don't think it will work for stable_cumsum in old numpy. I might be wrong though.

amueller · 2016-09-08T14:26:20Z

ah... huh...

jnothman · 2016-09-08T14:30:52Z

So just drop the test, I suppose.

jnothman · 2016-09-08T14:31:02Z

And merge the PR?

amueller · 2016-09-08T14:50:16Z

fine with me

lesteve · 2016-09-08T14:59:26Z

I believe the improved stability of np.sum was done in numpy 1.9: http://docs.scipy.org/doc/numpy/release.html#better-numerical-stability-for-sum-in-some-cases and I also quickly checked that in 1.8 cumsum and sum were giving the same wrong result in one of your snippet but 1.9 was fine. Maybe skip the test for numpy < 1.9?

lesteve · 2016-09-08T15:21:28Z

sklearn/utils/tests/test_extmath.py



 def test_stable_cumsum():
+    if np_version < (1, 19):


19 -> 9 otherwise we may wait a while until this test gets run ;-).

Argh. Making so many errors. Should be in bed. Should take a break from this stuff, too!

jnothman · 2016-09-08T15:23:50Z

could please run a check for np1.9, @lesteve?

lesteve · 2016-09-08T15:45:07Z

could please run a check for np1.9, @lesteve?

Not sure what you exactly mean, but here is what I tried:

import numpy as np

np.random.seed(42)
n_samples = 4 * 10 ** 7
y = np.random.randint(2, size=n_samples)
prediction = np.random.normal(size=n_samples) + y * 0.01
trivial_weight = np.ones(n_samples)

print(np.cumsum(trivial_weight.astype('float32'))[-1])
print(np.sum(trivial_weight.astype('float32')))

Output for numpy 1.8:

1.67772e+07
1.67772e+07

Output for numpy 1.9:

1.67772e+07
4e+07

lesteve · 2016-09-08T15:58:22Z

I also made sure that the test was run for numpy 1.9. LGTM will wait for AppVeyor and then merge.

lesteve · 2016-09-09T04:16:52Z

Merging, thanks!

…scikit-learn#7331) * FIX use high precision cumsum and check it is stable enough

FIX use high precision cumsum and check it is stable enough

6ce5f18

TomDLT added this to the 0.18 milestone Sep 2, 2016

TomDLT reviewed Sep 2, 2016
View reviewed changes

jnothman added 2 commits September 3, 2016 21:58

FIX corrected code

bd19e30

Improved wording of error message

f58fdf7

jnothman added the Bug label Sep 3, 2016

amueller changed the title ~~[MRG] FIX use high precision cumsum and check it is stable enough~~ [MRG + 1] FIX use high precision cumsum and check it is stable enough Sep 6, 2016

jnothman mentioned this pull request Sep 8, 2016

Find and fix any potentially unstable cumsums #7359

Closed

TST Add test with reduced rtol, atol

9ced9da

jnothman force-pushed the cumsum branch from b93067f to 9ced9da Compare September 8, 2016 13:34

Limit test to numpy >=1.9

0cf09bc

lesteve reviewed Sep 8, 2016
View reviewed changes

TST correct numpy version check

eb3500a

lesteve merged commit 49d126f into scikit-learn:master Sep 9, 2016

yangarbiter mentioned this pull request Sep 9, 2016

[MRG+1] FIX unstable cumsum #7376

Merged

rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016

[MRG + 1] FIX use high precision cumsum and check it is stable enough (…

a09e0ce

…scikit-learn#7331) * FIX use high precision cumsum and check it is stable enough

TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG + 1] FIX use high precision cumsum and check it is stable enough (…

3dfeb2e

…scikit-learn#7331) * FIX use high precision cumsum and check it is stable enough

amueller mentioned this pull request Oct 21, 2016

kaggle AUC != sklearn AUC #6711

Closed

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG + 1] FIX use high precision cumsum and check it is stable enough (…

464197d

…scikit-learn#7331) * FIX use high precision cumsum and check it is stable enough

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] FIX use high precision cumsum and check it is stable enough (…

44f3d9d

…scikit-learn#7331) * FIX use high precision cumsum and check it is stable enough

Uh oh!

Conversation

jnothman commented Sep 2, 2016

Uh oh!

TomDLT commented Sep 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 3, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Sep 3, 2016

Uh oh!

yenchenlin commented Sep 3, 2016

Uh oh!

tguillemot commented Sep 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomDLT commented Sep 5, 2016

Uh oh!

jnothman commented Sep 5, 2016

Uh oh!

TomDLT commented Sep 6, 2016

Uh oh!

jnothman commented Sep 6, 2016

Uh oh!

amueller commented Sep 6, 2016

Uh oh!

amueller commented Sep 6, 2016

Uh oh!

jnothman commented Sep 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

lesteve commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

amueller commented Sep 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

amueller commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

amueller commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

amueller commented Sep 8, 2016

Uh oh!

lesteve commented Sep 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve Sep 8, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman Sep 8, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Sep 8, 2016

Uh oh!

lesteve commented Sep 8, 2016

Uh oh!

lesteve commented Sep 8, 2016

TomDLT commented Sep 2, 2016 •

edited

Loading

tguillemot commented Sep 5, 2016 •

edited

Loading

jnothman commented Sep 7, 2016 •

edited

Loading

amueller commented Sep 8, 2016 •

edited

Loading

lesteve commented Sep 8, 2016 •

edited

Loading