[MRG] Fix missing 'const' in a few memoryview declaration in trees. by jeremiedbb · Pull Request #13626 · scikit-learn/scikit-learn

jeremiedbb · 2019-04-12T11:25:03Z

Memory views were introduced in trees in #12886.
It misses the const keyword in a few declarations.

A typical use case is doing cross validation on a RandomForest:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

X = np.random.random_sample((10000, 1000))
y = np.random.randint(2, size=10000)
rf = RandomForestClassifier(n_jobs=-1)

cross_val_score(rf, X, y)

Here X is more than 1Mb, which means it's mem-mapped by joblib in cross_val_score. This code breaks on master.

What's happening is that for cross_val_score, the joblib backend is the sequential backend (as expected) but for the random forest it's loky backend, ignoring prefer='threads'. So it seems that even if this PR fixes the bug in sklearn, there's also a bug in joblib. @ogrisel

adrinjalali · 2019-04-12T12:59:27Z

LGTM, except I guess adding your example as a test wouldn't hurt.

jeremiedbb · 2019-04-12T14:31:57Z

I added a test. It does not involve cross_val_score. Only a mem-mapped X.

adrinjalali · 2019-04-12T15:23:52Z

Fails on windows, interesting!

thomasjpfan · 2019-04-12T15:32:50Z

sklearn/ensemble/tests/test_forest.py

+    # check that random forest supports read-only buffer (#13626)
+    X_orig = np.random.RandomState(0).random_sample((10, 2)).astype(np.float32)
+
+    with NamedTemporaryFile() as tmp:


Hmm

scikit-learn/sklearn/datasets/tests/test_svmlight_format.py

Lines 118 to 122 in e405505

with NamedTemporaryFile(prefix="sklearn-test", suffix=".gz") as tmp:

tmp.close() # necessary under windows

with open(datafile, "rb") as f:

with gzip.open(tmp.name, "wb") as fh_out:

shutil.copyfileobj(f, fh_out)

adrinjalali · 2019-04-13T13:20:34Z

sklearn/ensemble/tests/test_forest.py

+        X_mmap = np.memmap(tmp.name, dtype='float32', mode='r', shape=(10, 2))
+        y = np.zeros(10)
+
+        RandomForestClassifier(n_estimators=2).fit(X_mmap, y)


I think the test could be under sklearn/tree/tests/test_tree.py, and test a DecisionTreeRegressor instead. It kinda feels like that's a more natural place for the test since it's actually testing the splitter.

I moved the test, and cleaned it since we actually have a helper to test on memmap arrays.

adrinjalali

LGTM

jeremiedbb · 2019-04-15T12:56:34Z

I wondered why the common test check_classifiers_train(readonly_memmap=True) passes on master. It tests on float64 data, but the tree requires float32 so it makes a copy and it's no longer a memmap...

jnothman · 2019-04-15T13:23:05Z

Should we run the common test with both dtypes?

…

jnothman · 2019-04-15T13:23:55Z

Or perhaps there should be an estimator tag specifying what format (dtype, order) is non-copying for some estimator

jeremiedbb · 2019-04-16T08:44:25Z

Should we run the common test with both dtypes?
Or perhaps there should be an estimator tag specifying what format (dtype, order) is non-copying for some estimator

I'd prefer the second option since the common tests are already quite long. But it's out of scope of this PR I think.

thomasjpfan · 2019-04-16T11:08:20Z

Thank you! @jeremiedbb

…ikit-learn#13626)

…ees. (scikit-learn#13626)" This reverts commit 6889d2b.

…ikit-learn#13626)

add missing const keyword for memoryviews

8080086

jeremiedbb force-pushed the fix-trees-memview branch from 7c8adbf to 8080086 Compare April 12, 2019 11:28

add test

4f20a54

thomasjpfan reviewed Apr 12, 2019

View reviewed changes

jeremiedbb added 2 commits April 12, 2019 17:56

fix test for windows

239b13a

delete file access

1aa6912

adrinjalali reviewed Apr 13, 2019

View reviewed changes

jnothman added this to the 0.21 milestone Apr 15, 2019

jeremiedbb added 2 commits April 15, 2019 13:29

move & clean test

07fb042

y read only

df3d93c

adrinjalali approved these changes Apr 15, 2019

View reviewed changes

thomasjpfan approved these changes Apr 16, 2019

View reviewed changes

thomasjpfan merged commit 5bc3edc into scikit-learn:master Apr 16, 2019

jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019

BUG Fix missing 'const' in a few memoryview declaration in trees. (sc…

b928396

…ikit-learn#13626)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

BUG Fix missing 'const' in a few memoryview declaration in trees. (sc…

6889d2b

…ikit-learn#13626)

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "BUG Fix missing 'const' in a few memoryview declaration in tr…

fe91d24

…ees. (scikit-learn#13626)" This reverts commit 6889d2b.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "BUG Fix missing 'const' in a few memoryview declaration in tr…

4977151

…ees. (scikit-learn#13626)" This reverts commit 6889d2b.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

BUG Fix missing 'const' in a few memoryview declaration in trees. (sc…

f43be06

…ikit-learn#13626)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG] Fix missing 'const' in a few memoryview declaration in trees.#13626

[MRG] Fix missing 'const' in a few memoryview declaration in trees.#13626
thomasjpfan merged 6 commits intoscikit-learn:masterfrom
jeremiedbb:fix-trees-memview

jeremiedbb commented Apr 12, 2019 •

edited

Loading

Uh oh!

adrinjalali commented Apr 12, 2019

Uh oh!

jeremiedbb commented Apr 12, 2019

Uh oh!

adrinjalali commented Apr 12, 2019

Uh oh!

thomasjpfan Apr 12, 2019

Uh oh!

adrinjalali Apr 13, 2019

Uh oh!

jeremiedbb Apr 15, 2019

Uh oh!

adrinjalali left a comment

Uh oh!

jeremiedbb commented Apr 15, 2019

Uh oh!

jnothman commented Apr 15, 2019 via email

Uh oh!

jnothman commented Apr 15, 2019 via email

Uh oh!

jeremiedbb commented Apr 16, 2019

Uh oh!

thomasjpfan commented Apr 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	with NamedTemporaryFile(prefix="sklearn-test", suffix=".gz") as tmp:
	tmp.close() # necessary under windows
	with open(datafile, "rb") as f:
	with gzip.open(tmp.name, "wb") as fh_out:
	shutil.copyfileobj(f, fh_out)

Uh oh!

Conversation

jeremiedbb commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Apr 12, 2019

Uh oh!

jeremiedbb commented Apr 12, 2019

Uh oh!

adrinjalali commented Apr 12, 2019

Uh oh!

thomasjpfan Apr 12, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali Apr 13, 2019

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Apr 15, 2019

Uh oh!

jnothman commented Apr 15, 2019 via email

Uh oh!

jnothman commented Apr 15, 2019 via email

Uh oh!

jeremiedbb commented Apr 16, 2019

Uh oh!

thomasjpfan commented Apr 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeremiedbb commented Apr 12, 2019 •

edited

Loading