[MRG] CalibratedClassifierCV passes on fit_params#15218
[MRG] CalibratedClassifierCV passes on fit_params#15218BenjaminBossan wants to merge 1399 commits intoscikit-learn:masterfrom
Conversation
|
The test coverage decreased after adding |
sklearn/utils/_mocking.py
Outdated
| assert self.check_X(T) | ||
| return self.classes_[np.zeros(_num_samples(T), dtype=np.int)] | ||
|
|
||
| def predict_proba(self, T): |
There was a problem hiding this comment.
why T? The scikit-learn convention is X for the input data of a statistical estimator.
There was a problem hiding this comment.
I tried to stick close to the predict method, which also uses T. But I can change it (as well as predict).
sklearn/calibration.py
Outdated
| this_estimator = clone(base_estimator) | ||
|
|
||
| fit_params_train = {} | ||
| if fit_params: |
There was a problem hiding this comment.
This is probably not necessary, fit_params.items() would return an empty dict when there are no params and the for loops would get 0 iterations..
sklearn/utils/_mocking.py
Outdated
| assert self.check_X(T) | ||
| return self.classes_[np.zeros(_num_samples(T), dtype=np.int)] | ||
|
|
||
| def predict_proba(self, T): |
sklearn/tests/test_calibration.py
Outdated
| X, y = make_classification(n_samples=2 * n_samples, n_features=6, | ||
| random_state=42) | ||
|
|
||
| fit_params = {'a': y, 'b': y} |
There was a problem hiding this comment.
Please also check with non-arrays, such as lists
There was a problem hiding this comment.
What would be the expectation here? That lists work as long as they have the correct length? Is there something else that should work?
There was a problem hiding this comment.
Yes, lists are the main thing to check, under the assumption that if they are handled, so are other sequences
There was a problem hiding this comment.
With lists, the line fit_params_train[k] = v[train] fails, as expected. I thus wanted to use indexable to make sure that lists would pass but actually, indexable doesn't touch lists because of this line:
scikit-learn/sklearn/utils/validation.py
Lines 224 to 225 in 1495f69
What is the right way to deal with lists in this case?
There was a problem hiding this comment.
So I think you need to use safe_indexing rather than v[train]
There was a problem hiding this comment.
Thx, I forgot about that function. I used _safe_indexing instead of safe_indexing because the latter is deprecated.
* DOC Removes FunctionTransformer example * DOC Updates description
* DOC Adds note on dataframe * DOC Address comments
…cikit-learn#13022) * ENH: Patches autosummary for case insensitive file systems * DOC: More details * REV
…cikit-learn#16883)" This reverts commit e5cc2b0.
…izer (scikit-learn#17718) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…cikit-learn#17721) Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…sotonicRegression (scikit-learn#16289) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
…kit-learn#17679) * BUG fix SparseCoder to follow scikit-learn API * TST check that get_params and set_params work as expected * address comments * PEP8 * iter * Fixes scikit-learn#8675, fix cloning for SparseCoder * remove spaces * Update _dict_learning.py * fix confusing arguments * remove unnecessary code * Update test_common.py * removes spaces * PEP8 * iter * fix merge * ignore a mypy warning * type: ignore * remove one deprecated verification * Update sklearn/decomposition/_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Update sklearn/decomposition/_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * Update sklearn/decomposition/tests/test_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * test clone produces different id * review * add one more test * lint * Update sklearn/decomposition/tests/test_dict_learning.py Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…it-learn#14800) Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
* ENH _fit_and_score now returns a dictionary * MRG * REV * LOL * Removing things in gitignore is fun
…arn#17707) * MNT More replacements of numpy aliases with built-in types [scipy-dev] * MNT More (manual) replacements of numpy aliases * MNT More (manual) replacements of numpy aliases * FIX Minor change * MNT Trigger build [scipy-dev] * FIX Minor change * MNT Trigger build [scipy-dev] * FIX More deprecations of numpy aliases [scipy-dev] * MNT Silent numpy aliases warnings [scipy-dev] * FIX Fix silent deprecations * Trigger build [scipy-dev] * FIX Add scape characters [scipy-dev] * FIX Remove package specification from silent * Trigger build [scipy-dev] * Revert
…near models (scikit-learn#17665) Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…it-learn#17747) Co-authored-by: Beatriz San Miguel <beatriz.sanmiguelgonzalez@uk.fujitsu.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
…n#17418) Co-authored-by: Thomas S. Benjamin <tbenjamin@vencorelabs.com> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…-learn#16406) Co-authored-by: Henry <hlc5v@virginia.edu> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
…erceptron.py (scikit-learn#18133) * DOC Fix doc of default values in sklearn.neural_network._multilayer_perceptron.py * Update sklearn/neural_network/_multilayer_perceptron.py Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
* fixed a cross-platform endian issue * removed a duplicated dtype checking * Trigger [arm64] CI * added an entry to doc/whats_new/v0.24.rst * moved my fix entry to sklearn.tree section * Update sklearn/tree/_tree.pyx Added comments Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * code simplified * fixed a typo in sklearn/tree/_tree.pyx * Update doc/whats_new/v0.24.rst Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * updated doc/whats_new/v0.24.rst * Update sklearn/tree/_tree.pyx Added a space after "if", to be PEP8 compliant. Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com> * Update sklearn/tree/_tree.pyx Co-authored-by: Qi Zhang <q.zhang@ibm.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
…kit-learn#18149) Co-authored-by: Sylvain MARIE <sylvain.marie@se.com>
…arn#18142) Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
…7702) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
* fix brier * fix brier * add ref * add ref for lint * Add link to PDF for calibration book chapter. * Fix broken repeated ref by using qualified identifier * Fix indentation Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* wip * add user guide * fix link to ex * suggestion * suggestions
|
Hi @BenjaminBossan , are you still interested in working on that? If yes, do you mind fixing conflicts? This will hopefully bring back some attention from reviewers. Thanks! |
…scikit-learn#10591) * Initial add DET curve to classification metrics * Add DET to exports * Fix DET-curve doctest errors - Sample snippet in model_evaluation documentation was outdated. * Clarify wording in DET-curve computation - Align to the wording of ranking module to make it consistent. - Add correct describtion of input and outputs. - Update and fix non-existent links * Beautify DET curve documentation source - Limit line length to 80 characters. * Expand DET curve documentation - Add an example plot to show difference between ROC and DET curves. - Expand Usage Note section with background information and properties of DET curves. * Update DET-curve documentation - Fix typos and some grammar improvements. - Use named references to avoid potential conflicts with other sections. - Remove unneeded references and improved existing ones by using e.g. using versioned links. * Select relevant DET points using slice object * Remove some dubiety from DET curve doc-string * Add DET curve contributors * Add tests for DET curves * Streamline DET test by using parametrization * Increase verbosity of DET curve error handling - Explicitly sanity check input before computing a DET curve. - Add test for perfect scores. - Adapt indentation style to match the test module. * Add reference for DET curves in invariance test * Add automated invariance checks for DET curves * Resolve merge artifacts * Make doctest happy * Fix whitespaces for doctest * Revert unintended whitespace changes * Revert unintended white space changes scikit-learn#2 * Fix typos and grammar * Fix white space in doc * Streamline test code * Remove rebase artifacts * Fix PR link in doc * Fix test_ranking * Fix rebase errors * Fix import * Bring back newlines - Swallowed by copy/paste * Remove uncited ref link * Remove matplotlib deprecation warning * Bring back hidden reference * Add motivation to DET example * Fix lint * Add citation * Use modern matplotlib API Co-authored-by: Jeremy Karnowski <jeremy.karnowski@gmail.com> Co-authored-by: Julien Cornebise <julien@cornebise.com> Co-authored-by: Daniel Mohns <daniel.mohns@zenguard.org>
Partly addresses to scikit-learn#12325 This PR makes it possible to pass fit_params to the fit method of the CalibratedClassifierCV, which are then routed to the underlying base estimator.
After adding the predict_proba method on CheckingClassifier, the test coverage decreased. Therefore, a test for predict_proba was added. More tests for CheckingClassifier will be added in a separate PR.
…/github.com/BenjaminBossan/scikit-learn into calibratedclassifiercv-passes-on-fit-params
Reference Issues/PRs
Partly addresses #12384
What does this implement/fix? Explain your changes.
This PR makes it possible to pass
fit_paramsto the fit method of theCalibratedClassifierCV, which are then routed to the underlying baseestimator.
Note: I implemented
predict_probaonCheckingClassifierfor thenew unit test to work.
Any other comments?
While working on this issue, I discovered these lines, which may be problematic:
scikit-learn/sklearn/calibration.py
Lines 168 to 176 in 86aea99
In fact, if the base estimator doesn't have
sample_weightinits signature, the
sample_weightargument is not routed toit. However, one could argue that it should still be routed to it
if
fit_paramsare part of the signature.This can be relevant, for instance, when the base estimator is
just a meta estimator that delegates all
fit_paramsto the trueestimator. With the current implementation,
sample_weightwouldnot be passed on.