[MRG] CalibratedClassifierCV passes on fit_params by BenjaminBossan · Pull Request #15218 · scikit-learn/scikit-learn

BenjaminBossan · 2019-10-12T14:30:56Z

Reference Issues/PRs

Partly addresses #12384

What does this implement/fix? Explain your changes.

This PR makes it possible to pass fit_params to the fit method of the
CalibratedClassifierCV, which are then routed to the underlying base
estimator.

Note: I implemented predict_proba on CheckingClassifier for the
new unit test to work.

Any other comments?

While working on this issue, I discovered these lines, which may be problematic:

scikit-learn/sklearn/calibration.py

Lines 168 to 176 in 86aea99

    
           fit_parameters = signature(base_estimator.fit).parameters 
        
           estimator_name = type(base_estimator).__name__ 
        
           if (sample_weight is not None 
        
                   and "sample_weight" not in fit_parameters): 
        
               warnings.warn("%s does not support sample_weight. Samples" 
        
                             " weights are only used for the calibration" 
        
                             " itself." % estimator_name) 
        
               sample_weight = check_array(sample_weight, ensure_2d=False) 
        
               base_estimator_sample_weight = None

In fact, if the base estimator doesn't have sample_weight in
its signature, the sample_weight argument is not routed to
it. However, one could argue that it should still be routed to it
if fit_params are part of the signature.

This can be relevant, for instance, when the base estimator is
just a meta estimator that delegates all fit_params to the true
estimator. With the current implementation, sample_weight would
not be passed on.

BenjaminBossan · 2019-10-13T09:13:42Z

The test coverage decreased after adding predict_proba in CheckingClassifier, which is why I added a test for it. I will add more tests for CheckingClassifier in a separate PR.

ogrisel

LGTM, just a quick nitpick:

ogrisel · 2019-10-13T16:43:47Z

sklearn/utils/_mocking.py

            assert self.check_X(T)
        return self.classes_[np.zeros(_num_samples(T), dtype=np.int)]

+    def predict_proba(self, T):


why T? The scikit-learn convention is X for the input data of a statistical estimator.

I tried to stick close to the predict method, which also uses T. But I can change it (as well as predict).

please do change it.

rth · 2019-10-14T07:59:40Z

sklearn/calibration.py

                this_estimator = clone(base_estimator)
+
+                fit_params_train = {}
+                if fit_params:


This is probably not necessary, fit_params.items() would return an empty dict when there are no params and the for loops would get 0 iterations..

rth · 2019-10-14T08:00:32Z

sklearn/utils/_mocking.py

            assert self.check_X(T)
        return self.classes_[np.zeros(_num_samples(T), dtype=np.int)]

+    def predict_proba(self, T):


please do change it.

jnothman · 2019-10-15T20:48:38Z

sklearn/tests/test_calibration.py

+    X, y = make_classification(n_samples=2 * n_samples, n_features=6,
+                               random_state=42)
+
+    fit_params = {'a': y, 'b': y}


Please also check with non-arrays, such as lists

What would be the expectation here? That lists work as long as they have the correct length? Is there something else that should work?

Yes, lists are the main thing to check, under the assumption that if they are handled, so are other sequences

With lists, the line fit_params_train[k] = v[train] fails, as expected. I thus wanted to use indexable to make sure that lists would pass but actually, indexable doesn't touch lists because of this line:

scikit-learn/sklearn/utils/validation.py

Lines 224 to 225 in 1495f69

elif hasattr(X, "__getitem__") or hasattr(X, "iloc"):

result.append(X)

What is the right way to deal with lists in this case?

So I think you need to use safe_indexing rather than v[train]

Thx, I forgot about that function. I used _safe_indexing instead of safe_indexing because the latter is deprecated.

…py (scikit-learn#17717)

* DOC Removes FunctionTransformer example * DOC Updates description

…rn#17716)

* DOC Adds note on dataframe * DOC Address comments

…cikit-learn#13022) * ENH: Patches autosummary for case insensitive file systems * DOC: More details * REV

…cikit-learn#16883)" This reverts commit e5cc2b0.

…#17723)

…izer (scikit-learn#17718) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

…cikit-learn#17721) Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

scikit-learn#17726)

…#17722)

…sotonicRegression (scikit-learn#16289) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

…kit-learn#17679) * BUG fix SparseCoder to follow scikit-learn API * TST check that get_params and set_params work as expected * address comments * PEP8 * iter * Fixes scikit-learn#8675, fix cloning for SparseCoder * remove spaces * Update _dict_learning.py * fix confusing arguments * remove unnecessary code * Update test_common.py * removes spaces * PEP8 * iter * fix merge * ignore a mypy warning * type: ignore * remove one deprecated verification * Update sklearn/decomposition/_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Update sklearn/decomposition/_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Update sklearn/decomposition/tests/test_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * test clone produces different id * review * add one more test * lint * Update sklearn/decomposition/tests/test_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…it-learn#14800) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* ENH _fit_and_score now returns a dictionary * MRG * REV * LOL * Removing things in gitignore is fun

…arn#17707) * MNT More replacements of numpy aliases with built-in types [scipy-dev] * MNT More (manual) replacements of numpy aliases * MNT More (manual) replacements of numpy aliases * FIX Minor change * MNT Trigger build [scipy-dev] * FIX Minor change * MNT Trigger build [scipy-dev] * FIX More deprecations of numpy aliases [scipy-dev] * MNT Silent numpy aliases warnings [scipy-dev] * FIX Fix silent deprecations * Trigger build [scipy-dev] * FIX Add scape characters [scipy-dev] * FIX Remove package specification from silent * Trigger build [scipy-dev] * Revert

…near models (scikit-learn#17665) Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

…it-learn#17747) Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

…n#17418) Co-authored-by: Thomas S. Benjamin <tbenjamin@vencorelabs.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

…-learn#16406) Co-authored-by: Henry <hlc5v@virginia.edu> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

…t-learn#18123)

…erceptron.py (scikit-learn#18133) * DOC Fix doc of default values in sklearn.neural_network._multilayer_perceptron.py * Update sklearn/neural_network/_multilayer_perceptron.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

* fixed a cross-platform endian issue * removed a duplicated dtype checking * Trigger [arm64] CI * added an entry to doc/whats_new/v0.24.rst * moved my fix entry to sklearn.tree section * Update sklearn/tree/_tree.pyx Added comments Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * code simplified * fixed a typo in sklearn/tree/_tree.pyx * Update doc/whats_new/v0.24.rst Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * updated doc/whats_new/v0.24.rst * Update sklearn/tree/_tree.pyx Added a space after "if", to be PEP8 compliant. Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * Update sklearn/tree/_tree.pyx Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

…kit-learn#18149) Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>

…arn#18142) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

…7702) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* fix brier * fix brier * add ref * add ref for lint * Add link to PDF for calibration book chapter. * Fix broken repeated ref by using qualified identifier * Fix indentation Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

* wip * add user guide * fix link to ex * suggestion * suggestions

…t-learn#17610)

cmarmo · 2020-08-15T20:42:59Z

Hi @BenjaminBossan , are you still interested in working on that? If yes, do you mind fixing conflicts? This will hopefully bring back some attention from reviewers. Thanks!

…18115)

…scikit-learn#10591) * Initial add DET curve to classification metrics * Add DET to exports * Fix DET-curve doctest errors - Sample snippet in model_evaluation documentation was outdated. * Clarify wording in DET-curve computation - Align to the wording of ranking module to make it consistent. - Add correct describtion of input and outputs. - Update and fix non-existent links * Beautify DET curve documentation source - Limit line length to 80 characters. * Expand DET curve documentation - Add an example plot to show difference between ROC and DET curves. - Expand Usage Note section with background information and properties of DET curves. * Update DET-curve documentation - Fix typos and some grammar improvements. - Use named references to avoid potential conflicts with other sections. - Remove unneeded references and improved existing ones by using e.g. using versioned links. * Select relevant DET points using slice object * Remove some dubiety from DET curve doc-string * Add DET curve contributors * Add tests for DET curves * Streamline DET test by using parametrization * Increase verbosity of DET curve error handling - Explicitly sanity check input before computing a DET curve. - Add test for perfect scores. - Adapt indentation style to match the test module. * Add reference for DET curves in invariance test * Add automated invariance checks for DET curves * Resolve merge artifacts * Make doctest happy * Fix whitespaces for doctest * Revert unintended whitespace changes * Revert unintended white space changes scikit-learn#2 * Fix typos and grammar * Fix white space in doc * Streamline test code * Remove rebase artifacts * Fix PR link in doc * Fix test_ranking * Fix rebase errors * Fix import * Bring back newlines - Swallowed by copy/paste * Remove uncited ref link * Remove matplotlib deprecation warning * Bring back hidden reference * Add motivation to DET example * Fix lint * Add citation * Use modern matplotlib API Co-authored-by: Jeremy Karnowski <jeremy.karnowski@gmail.com> Co-authored-by: Julien Cornebise <julien@cornebise.com> Co-authored-by: Daniel Mohns <daniel.mohns@zenguard.org>

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

BenjaminBossan mentioned this pull request Oct 13, 2019

[MRG] ENH allow to specify which methods should run a check in CheckingClassifier #15230

Merged

ogrisel approved these changes Oct 13, 2019

View reviewed changes

rth reviewed Oct 14, 2019

View reviewed changes

jnothman reviewed Oct 15, 2019

View reviewed changes

github-actions bot added the module:utils label Mar 2, 2020

hs-nazuna and others added 24 commits June 25, 2020 14:25

DOC add explanation of n_jobs in sklearn/model_selection/_validation.…

c95ed75

…py (scikit-learn#17717)

DOC Removes FunctionTransformer example (scikit-learn#17318)

6195a16

* DOC Removes FunctionTransformer example * DOC Updates description

DOC Add explanation of n_jobs in gaussian_process/_gpc.py (scikit-lea…

99baa50

…rn#17716)

DOC Adds note on dataframe in contributing guide (scikit-learn#17359)

9cf2bac

* DOC Adds note on dataframe * DOC Address comments

ENH Patches sphinx.ext.autosummary for case insensitive file systems (s…

8a695d7

…cikit-learn#13022) * ENH: Patches autosummary for case insensitive file systems * DOC: More details * REV

Revert "ENH Uses binned values from training to find missing values (s…

c9ca1d7

…cikit-learn#16883)" This reverts commit e5cc2b0.

ENH Add inverse_transform feature to SimpleImputer (scikit-learn#17612)

0a3ab41

DOC add explanation of n_jobs in permutation_importance (scikit-learn…

67e24a4

…#17723)

MNT Fix extension for sphinx 3.0 (scikit-learn#17724)

dad615a

DOC improve attributes and parameters descriptions in MultiLabelBinar…

ec1070a

…izer (scikit-learn#17718) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

MNT Add type annotations for OpenML fetcher (scikit-learn#17053)

361c052

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC improve description of OneClassSVM when used for outlier detection (

2e47204

scikit-learn#17726)

DOC coef_ and intercept_ documentation for ComplementNB (scikit-learn…

db2d905

…#17722)

[MRG] Expose interpolation thresholds as public fitted attribute of I…

acb8ac5

…sotonicRegression (scikit-learn#16289) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

ENH Verify md5-checksums received from openml arff file metadata(scik…

1e08459

…it-learn#14800) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

DOC add missing attributes to LocalOutlierFactor (scikit-learn#17696)

3491e07

Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DOC Add example in docstring calibration_curve (scikit-learn#17731)

ed5a4d9

DOC add whats new entry following scikit-learn#17679 (scikit-learn#17738

27cfe14

) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ENH _fit_and_score now returns a dictionary (scikit-learn#17332)

67bf1e7

* ENH _fit_and_score now returns a dictionary * MRG * REV * LOL * Removing things in gitignore is fun

TST check equivalence normalize/StandardScaler and dense/sparse in li…

1c62652

…near models (scikit-learn#17665) Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

NicolasHug and others added 19 commits August 7, 2020 10:01

DOC minor typo fix (scikit-learn#18119)

c3effe4

DOC Fixes hard coded title in gmm selection example plot (scikit-lear…

61ce18f

…n#17418) Co-authored-by: Thomas S. Benjamin <tbenjamin@vencorelabs.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

DOC Add note on the max number of extmath.cartesian arguments (scikit…

eff1bdf

…-learn#16406) Co-authored-by: Henry <hlc5v@virginia.edu> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

ENH Add verbose option to SpectralClustering (scikit-learn#18052)

603d05b

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC Mark add_toctree_function safe for parallel read and write (sciki…

173cf53

…t-learn#18123)

DOC Update TSNE docstring (scikit-learn#18120)

1ac9d68

DOC on issue triaging process (scikit-learn#17907)

618e625

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

DOC Add example comparing PCR and PLS (scikit-learn#17151)

25487f2

FIX use specific threshold to discard eigenvalues with 32 bits fp(sci…

acf195c

…kit-learn#18149) Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>

DOC Clarify triaging role. Remove usage question template. (scikit-le…

8c74d98

…arn#18142) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

DEP Deprecate n_classes_ in GradientBoostingRegressor (scikit-learn#1…

989613e

…7702) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

DOC Add user guide for permutation_test_score (scikit-learn#18055)

018de22

* wip * add user guide * fix link to ex * suggestion * suggestions

MNT Support for case insensitive filesystems (scikit-learn#18151)

e89157e

ENH Permutation importance support sample weight (scikit-learn#16906)

2c0bd62

CI Disable 32bit pytest xdist (scikit-learn#18161)

ad6a38f

API Change default value of as_frame in fetch_openml to 'auto' (sciki…

bdf2ff5

…t-learn#17610)

rauwuckl and others added 8 commits August 16, 2020 09:54

DOC clarify the kernel gradient for GaussianProcesses (scikit-learn#…

eb7b158

…18115)

CalibratedClassifierCV passes on fit_params

08ce808

Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.

Add test for CheckingClassifier

2d7c381

After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.

Remove unnecessary test code

7684740

Rename function used in test

fdd1af2

Merge branch 'calibratedclassifiercv-passes-on-fit-params' of https:/…

9e68abf

…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params

Fix linting error, line too long

a12f31c

BenjaminBossan closed this Aug 16, 2020

BenjaminBossan mentioned this pull request Aug 16, 2020

ENH make CalibratedClassifierCV accept on fit_params #18170

Merged

	fit_parameters = signature(base_estimator.fit).parameters
	estimator_name = type(base_estimator).__name__
	if (sample_weight is not None
	and "sample_weight" not in fit_parameters):
	warnings.warn("%s does not support sample_weight. Samples"
	" weights are only used for the calibration"
	" itself." % estimator_name)
	sample_weight = check_array(sample_weight, ensure_2d=False)
	base_estimator_sample_weight = None

	elif hasattr(X, "__getitem__") or hasattr(X, "iloc"):
	result.append(X)

Uh oh!

Conversation

BenjaminBossan commented Oct 12, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

BenjaminBossan commented Oct 13, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmarmo commented Aug 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants