Skip to content

[MRG] CalibratedClassifierCV passes on fit_params#15218

Closed
BenjaminBossan wants to merge 1399 commits intoscikit-learn:masterfrom
BenjaminBossan:calibratedclassifiercv-passes-on-fit-params
Closed

[MRG] CalibratedClassifierCV passes on fit_params#15218
BenjaminBossan wants to merge 1399 commits intoscikit-learn:masterfrom
BenjaminBossan:calibratedclassifiercv-passes-on-fit-params

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Partly addresses #12384

What does this implement/fix? Explain your changes.

This PR makes it possible to pass fit_params to the fit method of the
CalibratedClassifierCV, which are then routed to the underlying base
estimator.

Note: I implemented predict_proba on CheckingClassifier for the
new unit test to work.

Any other comments?

While working on this issue, I discovered these lines, which may be problematic:

fit_parameters = signature(base_estimator.fit).parameters
estimator_name = type(base_estimator).__name__
if (sample_weight is not None
and "sample_weight" not in fit_parameters):
warnings.warn("%s does not support sample_weight. Samples"
" weights are only used for the calibration"
" itself." % estimator_name)
sample_weight = check_array(sample_weight, ensure_2d=False)
base_estimator_sample_weight = None

In fact, if the base estimator doesn't have sample_weight in
its signature, the sample_weight argument is not routed to
it. However, one could argue that it should still be routed to it
if fit_params are part of the signature.

This can be relevant, for instance, when the base estimator is
just a meta estimator that delegates all fit_params to the true
estimator. With the current implementation, sample_weight would
not be passed on.

@BenjaminBossan
Copy link
Copy Markdown
Contributor Author

The test coverage decreased after adding predict_proba in CheckingClassifier, which is why I added a test for it. I will add more tests for CheckingClassifier in a separate PR.

Copy link
Copy Markdown
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a quick nitpick:

assert self.check_X(T)
return self.classes_[np.zeros(_num_samples(T), dtype=np.int)]

def predict_proba(self, T):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why T? The scikit-learn convention is X for the input data of a statistical estimator.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to stick close to the predict method, which also uses T. But I can change it (as well as predict).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do change it.

this_estimator = clone(base_estimator)

fit_params_train = {}
if fit_params:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably not necessary, fit_params.items() would return an empty dict when there are no params and the for loops would get 0 iterations..

assert self.check_X(T)
return self.classes_[np.zeros(_num_samples(T), dtype=np.int)]

def predict_proba(self, T):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do change it.

X, y = make_classification(n_samples=2 * n_samples, n_features=6,
random_state=42)

fit_params = {'a': y, 'b': y}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check with non-arrays, such as lists

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the expectation here? That lists work as long as they have the correct length? Is there something else that should work?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lists are the main thing to check, under the assumption that if they are handled, so are other sequences

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With lists, the line fit_params_train[k] = v[train] fails, as expected. I thus wanted to use indexable to make sure that lists would pass but actually, indexable doesn't touch lists because of this line:

elif hasattr(X, "__getitem__") or hasattr(X, "iloc"):
result.append(X)

What is the right way to deal with lists in this case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think you need to use safe_indexing rather than v[train]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, I forgot about that function. I used _safe_indexing instead of safe_indexing because the latter is deprecated.

hs-nazuna and others added 24 commits June 25, 2020 14:25
* DOC Removes FunctionTransformer example

* DOC Updates description
* DOC Adds note on dataframe

* DOC Address comments
…cikit-learn#13022)

* ENH: Patches autosummary for case insensitive file systems

* DOC: More details

* REV
…izer (scikit-learn#17718)

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…cikit-learn#17721)

Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…sotonicRegression (scikit-learn#16289)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
…kit-learn#17679)

* BUG fix SparseCoder to follow scikit-learn API

* TST check that get_params and set_params work as expected

* address comments

* PEP8

* iter

* Fixes scikit-learn#8675, fix cloning for SparseCoder

* remove spaces

* Update _dict_learning.py

* fix confusing arguments

* remove unnecessary code

* Update test_common.py

* removes spaces

* PEP8

* iter

* fix merge

* ignore a mypy warning

* type: ignore

* remove one deprecated verification

* Update sklearn/decomposition/_dict_learning.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* Update sklearn/decomposition/_dict_learning.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* Update sklearn/decomposition/tests/test_dict_learning.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

* test clone produces different id

* review

* add one more test

* lint

* Update sklearn/decomposition/tests/test_dict_learning.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…it-learn#14800)

Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* ENH _fit_and_score now returns a dictionary

* MRG

* REV

* LOL

* Removing things in gitignore is fun
…arn#17707)

* MNT More replacements of numpy aliases with built-in types [scipy-dev]

* MNT More (manual) replacements of numpy aliases

* MNT More (manual) replacements of numpy aliases

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX Minor change

* MNT Trigger build [scipy-dev]

* FIX More deprecations of numpy aliases [scipy-dev]

* MNT Silent numpy aliases warnings [scipy-dev]

* FIX Fix silent deprecations

* Trigger build [scipy-dev]

* FIX Add scape characters [scipy-dev]

* FIX Remove package specification from silent

* Trigger build [scipy-dev]

* Revert
…near models (scikit-learn#17665)

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…it-learn#17747)

Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
NicolasHug and others added 19 commits August 7, 2020 10:01
…n#17418)

Co-authored-by: Thomas S. Benjamin <tbenjamin@vencorelabs.com>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…-learn#16406)

Co-authored-by: Henry <hlc5v@virginia.edu>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…erceptron.py (scikit-learn#18133)

* DOC Fix doc of default values in sklearn.neural_network._multilayer_perceptron.py

* Update sklearn/neural_network/_multilayer_perceptron.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
* fixed a cross-platform endian issue

* removed a duplicated dtype checking

* Trigger [arm64] CI

* added an entry to doc/whats_new/v0.24.rst

* moved my fix entry to sklearn.tree section

* Update sklearn/tree/_tree.pyx

Added comments

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

* code simplified

* fixed a typo in sklearn/tree/_tree.pyx

* Update doc/whats_new/v0.24.rst

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

* updated doc/whats_new/v0.24.rst

* Update sklearn/tree/_tree.pyx

Added a space after "if", to be PEP8 compliant.

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

* Update sklearn/tree/_tree.pyx

Co-authored-by: Qi Zhang <q.zhang@ibm.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…7702)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
* fix brier

* fix brier

* add ref

* add ref for lint

* Add link to PDF for calibration book chapter.

* Fix broken repeated ref by using qualified identifier

* Fix indentation

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* wip

* add user guide

* fix link to ex

* suggestion

* suggestions
@cmarmo
Copy link
Copy Markdown
Contributor

cmarmo commented Aug 15, 2020

Hi @BenjaminBossan , are you still interested in working on that? If yes, do you mind fixing conflicts? This will hopefully bring back some attention from reviewers. Thanks!

rauwuckl and others added 8 commits August 16, 2020 09:54
…scikit-learn#10591)

* Initial add DET curve to classification metrics

* Add DET to exports

* Fix DET-curve doctest errors

- Sample snippet in  model_evaluation documentation was outdated.

* Clarify wording in DET-curve computation

- Align to the wording of ranking module to make it consistent.
- Add correct describtion of input and outputs.
- Update and fix non-existent links

* Beautify DET curve documentation source

- Limit line length to 80 characters.

* Expand DET curve documentation

- Add an example plot to show difference between ROC and DET curves.
- Expand Usage Note section with background information and properties
of DET curves.

* Update DET-curve documentation

- Fix typos and some grammar improvements.
- Use named references to avoid potential conflicts with other sections.
- Remove unneeded references and improved existing ones by using e.g.
using versioned links.

* Select relevant DET points using slice object

* Remove some dubiety from DET curve doc-string

* Add DET curve contributors

* Add tests for DET curves

* Streamline DET test by using parametrization

* Increase verbosity of DET curve error handling

- Explicitly sanity check input before computing a DET curve.
- Add test for perfect scores.
- Adapt indentation style to match the test module.

* Add reference for DET curves in invariance test

* Add automated invariance checks for DET curves

* Resolve merge artifacts

* Make doctest happy

* Fix whitespaces for doctest

* Revert unintended whitespace changes

* Revert unintended white space changes scikit-learn#2

* Fix typos and grammar

* Fix white space in doc

* Streamline test code

* Remove rebase artifacts

* Fix PR link in doc

* Fix test_ranking

* Fix rebase errors

* Fix import

* Bring back newlines

- Swallowed by copy/paste

* Remove uncited ref link

* Remove matplotlib deprecation warning

* Bring back hidden reference

* Add motivation to DET example

* Fix lint

* Add citation

* Use modern matplotlib API

Co-authored-by: Jeremy Karnowski <jeremy.karnowski@gmail.com>
Co-authored-by: Julien Cornebise <julien@cornebise.com>
Co-authored-by: Daniel Mohns <daniel.mohns@zenguard.org>
Partly addresses to scikit-learn#12325

This PR makes it possible to pass fit_params to the fit method of the
CalibratedClassifierCV, which are then routed to the underlying base
estimator.
After adding the predict_proba method on CheckingClassifier, the test
coverage decreased. Therefore, a test for predict_proba was
added. More tests for CheckingClassifier will be added in a separate
PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.