[MRG] Fix missing 'const' in a few memoryview declaration in trees.#13626
[MRG] Fix missing 'const' in a few memoryview declaration in trees.#13626thomasjpfan merged 6 commits intoscikit-learn:masterfrom
Conversation
7c8adbf to
8080086
Compare
|
LGTM, except I guess adding your example as a test wouldn't hurt. |
|
I added a test. It does not involve |
|
Fails on windows, interesting! |
| # check that random forest supports read-only buffer (#13626) | ||
| X_orig = np.random.RandomState(0).random_sample((10, 2)).astype(np.float32) | ||
|
|
||
| with NamedTemporaryFile() as tmp: |
There was a problem hiding this comment.
Hmm
scikit-learn/sklearn/datasets/tests/test_svmlight_format.py
Lines 118 to 122 in e405505
| X_mmap = np.memmap(tmp.name, dtype='float32', mode='r', shape=(10, 2)) | ||
| y = np.zeros(10) | ||
|
|
||
| RandomForestClassifier(n_estimators=2).fit(X_mmap, y) |
There was a problem hiding this comment.
I think the test could be under sklearn/tree/tests/test_tree.py, and test a DecisionTreeRegressor instead. It kinda feels like that's a more natural place for the test since it's actually testing the splitter.
There was a problem hiding this comment.
I moved the test, and cleaned it since we actually have a helper to test on memmap arrays.
|
I wondered why the common test |
|
Should we run the common test with both dtypes?
… |
|
Or perhaps there should be an estimator tag specifying what format (dtype,
order) is non-copying for some estimator
|
I'd prefer the second option since the common tests are already quite long. But it's out of scope of this PR I think. |
|
Thank you! @jeremiedbb |
…ees. (scikit-learn#13626)" This reverts commit 6889d2b.
…ees. (scikit-learn#13626)" This reverts commit 6889d2b.
Memory views were introduced in trees in #12886.
It misses the
constkeyword in a few declarations.A typical use case is doing cross validation on a
RandomForest:Here X is more than 1Mb, which means it's mem-mapped by joblib in
cross_val_score. This code breaks on master.What's happening is that for
cross_val_score, the joblib backend is the sequential backend (as expected) but for the random forest it's loky backend, ignoringprefer='threads'. So it seems that even if this PR fixes the bug in sklearn, there's also a bug in joblib. @ogrisel