[MRG+1] Add check for sparse prediction in cross_val_predict (fixes #5132)#5161
Conversation
|
please add a test |
sklearn/cross_validation.py
Outdated
There was a problem hiding this comment.
It should be possible to do with only one vstack call.
There was a problem hiding this comment.
Thanks for your suggestion. I have made the appropriate changes.
sklearn/cross_validation.py
Outdated
There was a problem hiding this comment.
Don't you still need this in the sparse case. Please, write a test so we actually have some idea if this is operating correctly!
There was a problem hiding this comment.
Sorry for my naivety, I now realize that it would be needed in the sparse case as well.
Although I encounter an efficiency issue while doing the same in the sparse case.
On running the following code snippet I get the following warning.
a = sp.csr_matrix([[0,0],[0,1],[1,1]])
id = numpy.array([2,0,1])
b = a.copy()
b[id] = a/Library/Python/2.7/site-packages/scipy/sparse/compressed.py:739: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. SparseEfficiencyWarning)
Also on printing b I get the following
print b(0, 0) 0
(0, 1) 1
(1, 0) 1
(1, 1) 1
(2, 0) 0
(2, 1) 0
which is obviously inefficient.
@jnothman
Can you please suggest any other efficient way of achieving the same or point out if I am going wrong somewhere.
There was a problem hiding this comment.
I think for both the numpy array and sparse matrix cases, we're better off inverting locs so that we can use return preds[inv_locs]. You can build inv_locs = np.empty(len(locs), dtype=int); inv_locs[locs] = np.arange(len(locs))
There was a problem hiding this comment.
@jnothman
Thanks a lot for suggestion. I use inverted locs to reorder the predictions array.
sklearn/cross_validation.py
Outdated
|
I have added a test for comparing the results of dense vs sparse cross_val_predict. Currently I use make_multilabel_classification from sklearn.datasets for the input data What n_sample should we have for the test data? |
|
Apart from minor cleanups, this looks good. Thanks! |
|
@jnothman Fixed the indentation issue raised by you. Thanks a lot for all your help. :) |
|
LGTM |
|
Can't seem to understand why the AppVeyor fails on the latest commit. |
|
The appveyor failure is unrelated to this PR: it looks like a segfault in sparse PCA... |
|
LGTM as well. Thanks for the fix. |
[MRG+1] Add check for sparse prediction in cross_val_predict (fixes #5132)
|
I will add a new whats_new.rst entry + fix a pep8 issue in master directly. |
|
Thanks @ogrisel for the fixes and the merge. :) |
|
@ogrisel did you see the segfault https://ci.appveyor.com/project/sklearn-ci/scikit-learn/build/1.0.1763/job/jmthejc7o35mtb6g before? Looks very suspicious. |
- MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict
- MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict
- MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict
From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests
From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests
From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests
From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
From scikit-learn#5161 - MAINT remove redundant p variable - Add check for sparse prediction in cross_val_predict From scikit-learn#5201 - DOC improve random_state param doc From scikit-learn#5190 - LabelKFold and test From scikit-learn#4583 - LabelShuffleSplit and tests From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests Other minor changes ------------------- Fix cross_validation reference Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
- From scikit-learn#4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
- From scikit-learn#4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
- From scikit-learn#4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
- From scikit-learn#4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From scikit-learn#5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From scikit-learn#5201 - DOC improve random_state param doc
- From scikit-learn#5190 - LabelKFold and test
- From scikit-learn#4583 - LabelShuffleSplit and tests
- From scikit-learn#5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From scikit-learn#5378 - Make the GridSearchCV docs more accurate.
- From scikit-learn#5458 - Remove shuffle from LabelKFold
- From scikit-learn#5466(scikit-learn#4270) - Gaussian Process by Jan Metzen
- From scikit-learn#4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
--------------------
* ENH Reogranize classes/fn from grid_search into search.py
* ENH Reogranize classes/fn from cross_validation into split.py
* ENH Reogranize cls/fn from cross_validation/learning_curve into validate.py
* MAINT Merge _check_cv into check_cv inside the model_selection module
* MAINT Update all the imports to point to the model_selection module
* FIX use iter_cv to iterate throught the new style/old style cv objs
* TST Add tests for the new model_selection members
* ENH Wrap the old-style cv obj/iterables instead of using iter_cv
* ENH Use scipy's binomial coefficient function comb for calucation of nCk
* ENH Few enhancements to the split module
* ENH Improve check_cv input validation and docstring
* MAINT _get_test_folds(X, y, labels) --> _get_test_folds(labels)
* TST if 1d arrays for X introduce any errors
* ENH use 1d X arrays for all tests;
* ENH X_10 --> X (global var)
Minor
-----
* ENH _PartitionIterator --> _BaseCrossValidator;
* ENH CVIterator --> CVIterableWrapper
* TST Import the old SKF locally
* FIX/TST Clean up the split module's tests.
* DOC Improve documentation of the cv parameter
* COSMIT consistently hyphenate cross-validation/cross-validator
* TST Calculate n_samples from X
* COSMIT Use separate lines for each import.
* COSMIT cross_validation_generator --> cross_validator
Commits merged manually
-----------------------
* FIX Document the random_state attribute in RandomSearchCV
* MAINT Use check_cv instead of _check_cv
* ENH refactor OVO decision function, use it in SVC for sklearn-like
decision_function shape
* FIX avoid memory cost when sampling from large parameter grids
ENH Major to Minor incremental enhancements to the model_selection
Squashed commit messages - (For reference)
Major
-----
* ENH p --> n_labels
* FIX *ShuffleSplit: all float/invalid type errors at init and int error at split
* FIX make PredefinedSplit accept test_folds in constructor; Cleanup docstrings
* ENH+TST KFold: make rng to be generated at every split call for reproducibility
* FIX/MAINT KFold: make shuffle a public attr
* FIX Make CVIterableWrapper private.
* FIX reuse len_cv instead of recalculating it
* FIX Prevent adding *SearchCV estimators from the old grid_search module
* re-FIX In all_estimators: the sorting to use only the 1st item (name)
To avoid collision between the old and the new GridSearch classes.
* FIX test_validate.py: Use 2D X (1D X is being detected as a single sample)
* MAINT validate.py --> validation.py
* MAINT make the submodules private
* MAINT Support old cv/gs/lc until 0.19
* FIX/MAINT n_splits --> get_n_splits
* FIX/TST test_logistic.py/test_ovr_multinomial_iris:
pass predefined folds as an iterable
* MAINT expose BaseCrossValidator
* Update the model_selection module with changes from master
- From #5161
- - MAINT remove redundant p variable
- - Add check for sparse prediction in cross_val_predict
- From #5201 - DOC improve random_state param doc
- From #5190 - LabelKFold and test
- From #4583 - LabelShuffleSplit and tests
- From #5300 - shuffle the `labels` not the `indxs` in LabelKFold + tests
- From #5378 - Make the GridSearchCV docs more accurate.
- From #5458 - Remove shuffle from LabelKFold
- From #5466(#4270) - Gaussian Process by Jan Metzen
- From #4826 - Move custom error / warnings into sklearn.exception
Minor
-----
* ENH Make the KFold shuffling test stronger
* FIX/DOC Use the higher level model_selection module as ref
* DOC in check_cv "y : array-like, optional"
* DOC a supervised learning problem --> supervised learning problems
* DOC cross-validators --> cross-validation strategies
* DOC Correct Olivier Grisel's name ;)
* MINOR/FIX cv_indices --> kfold
* FIX/DOC Align the 'See also' section of the new KFold, LeaveOneOut
* TST/FIX imports on separate lines
* FIX use __class__ instead of classmethod
* TST/FIX import directly from model_selection
* COSMIT Relocate the random_state documentation
* COSMIT remove pass
* MAINT Remove deprecation warnings from old tests
* FIX correct import at test_split
* FIX/MAINT Move P_sparse, X, y defns to top; rm unused W_sparse, X_sparse
* FIX random state to avoid doctest failure
* TST n_splits and split wrapping of _CVIterableWrapper
* FIX/MAINT Use multilabel indicator matrix directly
* TST/DOC clarify why we conflate classes 0 and 1
* DOC add comment that this was taken from BaseEstimator
* FIX use of labels is not needed in stratified k fold
* Fix cross_validation reference
* Fix the labels param doc
FIX/DOC/MAINT Addressing the review comments by Arnaud and Andy
COSMIT Sort the members alphabetically
COSMIT len_cv --> n_splits
COSMIT Merge 2 if; FIX Use kwargs
DOC Add my name to the authors :D
DOC make labels parameter consistent
FIX Remove hack for boolean indices; + COSMIT idx --> indices; DOC Add Returns
COSMIT preds --> predictions
DOC Add Returns and neatly arrange X, y, labels
FIX idx(s)/ind(s)--> indice(s)
COSMIT Merge if and else to elif
COSMIT n --> n_samples
COSMIT Use bincount only once
COSMIT cls --> class_i / class_i (ith class indices) -->
perm_indices_class_i
FIX/ENH/TST Addressing the final reviews
COSMIT c --> count
FIX/TST make check_cv raise ValueError for string cv value
TST nested cv (gs inside cross_val_score) works for diff cvs
FIX/ENH Raise ValueError when labels is None for label based cvs;
TST if labels is being passed correctly to the cv and that the
ValueError is being propagated to the cross_val_score/predict and grid
search
FIX pass labels to cross_val_score
FIX use make_classification
DOC Add Returns; COSMIT Remove scaffolding
TST add a test to check the _build_repr helper
REVERT the old GS/RS should also be tested by the common tests.
ENH Add a tuple of all/label based CVS
FIX raise VE even at get_n_splits if labels is None
FIX Fabian's comments
PEP8
This pull request is for #5132.
cross_val_predict should work for sparse 'y'