[MRG] Take sample weights into account in partial dependence computation#13193
Conversation
|
Can a test be added? |
|
I just added a few ! Should add some more ? |
|
Ping @NicolasHug as discussed during the sprint ! |
NicolasHug
left a comment
There was a problem hiding this comment.
The fix looks correct to me.
Please add a whatsnew entry as bugfix.
Also, it'd be nice to have some kind of functional test. Something along the lines of your example in the original issue would be good? Fitting a linear regression on the PDPs should give an r-squared close to 1.
This is fine to merge before or after #12599 BTW, one of us will have to update its PR regarding the tests ;)
| raise ValueError("left_sample_frac:%f, " | ||
| "n_samples current: %d, " | ||
| "n_samples left: %d" | ||
| raise ValueError("left_sample_frac:%d, " |
There was a problem hiding this comment.
IMO this whole if should be entirely removed. It's not tested anywhere as far as I can see, and this is not PDP-related (this is just tree related).
…into add-sample-weights-gbt-partial-dependency
|
tests still failing. Please add an entry to doc/whats_new/v0.21.rst |
|
Thanks for the review @NicolasHug -- I added a functional test based on the example built for the original issue. |
jnothman
left a comment
There was a problem hiding this comment.
Yes, much cleaner. LGTM if tests pass
|
Anyone for a quick review? @thomasjpfan maybe? |
|
@samronsin Can you please merge master (or trigger the CI with an empty commit) so that checks go green so we can merge :)? |
…into add-sample-weights-gbt-partial-dependency
|
Merging since the failed test is completely unrelated ( Thanks @samronsin ! |
|
Thanks @NicolasHug -- and also @jnothman and @thomasjpfan -- for your help on this PR ! |
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
…mputation for gradient boosting (scikit-learn#13193)" This reverts commit 4350b4e.
…mputation for gradient boosting (scikit-learn#13193)" This reverts commit 4350b4e.
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
Reference Issues/PRs
Fixes #13192.
What does this implement/fix? Explain your changes.
This PR makes
partial_dependencetake sample weights into account by replacingn_node_samplesbyweighted_n_node_samplesin partial dependence computation.