Skip to content

Conversation

@jjerphan
Copy link
Member

@jjerphan jjerphan commented Jan 9, 2023

Reference Issues/PRs

Fixes #25333.
Fixes #25247.

What does this implement/fix? Explain your changes.

In some workflows using DecisionTree* (this include RandomForest* and ExtraTrees*):

  • users might provide NumPy arrays with read-only buffers
  • joblib might memmap arrays making their buffer read-only.

Those cases are correctly supported when using dense datasets thanks to const-qualification of memoryview in Cython. Yet those cases aren't currently supported when using sparse datasets (for instance when X is provided as a CSC or as a CSR matrix).

const-qualifying CSC or CSR buffers (i.e. data, indices and indptr) in BaseSparseSplitter allows supporting those cases when using sparse datasets.

@jjerphan jjerphan changed the title FIX const-qualify BaseSparseSplitter attributes of CSR data FIX const-qualify BaseSparseSplitter attributes of CSR or CSC data Jan 9, 2023
@jjerphan jjerphan added the Quick Review For PRs that are quick to review label Jan 10, 2023
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jjerphan jjerphan changed the title FIX const-qualify BaseSparseSplitter attributes of CSR or CSC data FIX Support read-only sparse datasets for Tree-based estimators Jan 11, 2023
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! There is a merge conflict with main to be resolved.

@thomasjpfan thomasjpfan added this to the 1.2.1 milestone Jan 11, 2023
jjerphan and others added 3 commits January 12, 2023 09:17
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@jjerphan jjerphan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 12, 2023
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit c3fca81 into scikit-learn:main Jan 12, 2023
@jjerphan jjerphan deleted the fix/const-qualify-X-csr-in-BaseSparseSplitter branch January 12, 2023 17:57
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
…ikit-learn#25341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
…5341)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug cython module:ensemble module:tree Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Read only buffer in cross_val_score with sparse matrix. ValueError: buffer source array is read-only

4 participants