Skip to content

FIX support read only view for tree input#16331

Merged
rth merged 6 commits intoscikit-learn:masterfrom
Batalex:fix/tree-input-read-only
Feb 1, 2020
Merged

FIX support read only view for tree input#16331
rth merged 6 commits intoscikit-learn:masterfrom
Batalex:fix/tree-input-read-only

Conversation

@Batalex
Copy link
Copy Markdown
Contributor

@Batalex Batalex commented Jan 31, 2020

Reference Issues/PRs

Fixes #15851.

What does this implement/fix? Explain your changes.

According to https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#read-only-views

Since Cython 0.28, the memoryview item type can be declared as const to support read-only buffers as input

This should be fine with scikit-learn since the minimal Cython version supported is 28.5. We could also check the input array for the writeable flag and make a copy instead of a view if needed but I figured this solution would be better.

Any other comments?

The benchmarks now work as expected.

python benchmarks\bench_mnist.py --classifiers RandomForest ExtraTrees CART
[...]
Classification performance:
===========================
Classifier               train-time   test-time   error-rate
------------------------------------------------------------
ExtraTrees                   45.16s       0.45s       0.0294
RandomForest                 40.51s       0.39s       0.0295
CART                         21.61s       0.06s       0.1221

2020 Paris Sprint

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same in _decision_path_dense otherwise this sounds good to me.

Please add a unit tests that check that trees work with read-only array as input (you can use create_memmap_backed_data to make one).

Also please add a what's new entry to doc/whats_new/v0.23.rst indicating that the corresponding tree methods can now be used with joblib based multiprocessing (or @jeremiedbb might suggest a better formulation).

@rth rth changed the title ENH use read only view for tree input FIX support read only view for tree input Jan 31, 2020
@Batalex
Copy link
Copy Markdown
Contributor Author

Batalex commented Jan 31, 2020

I believe we can remove the test test_decision_tree_memmap https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tests/test_tree.py#L1826. It should be covered by the new ones and it will remove an import.

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Batalex, LGTM.

I pushed a small fix to improve the what's new entry given new information from #15851 (comment) and also removed test_decision_tree_memmap as you suggested since it is indeed redundant.

Will merge on green CI.

:class:`ensemble.GradientBoostingClassifier` as well as ``predict`` method of
:class:`tree.DecisionTreeRegressor`, :class:`tree.ExtraTreeRegressor`, and
:class:`ensemble.GradientBoostingRegressor`.
:pr:`16331` by :user:`Alexandre Batisse <batalex>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of affected estimators is obtained in #16359

@rth rth merged commit 0db80b9 into scikit-learn:master Feb 1, 2020
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: buffer source array is read-only when running bench_mnist.py

4 participants