ENH Adding estimators_samples_ attribute to forest models by adam2392 · Pull Request #26736 · scikit-learn/scikit-learn

adam2392 · 2023-06-30T06:23:41Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds the estimators_samples_ property to the BaseForest class, which allows any forest-method to recompute the indices of the training samples per tree in the forest

Any other comments?

Note this is essentially very similar to Bagging* except we only care about the samples considered rather than samples and feature indices.

This is useful for example for potentially allowing the trees to get closer to enabling things like i) honesty and ii) analysis of the trained samples used for the tree.

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions · 2023-06-30T06:25:14Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ab3be64. Link to the linter CI: here}

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-06-30T19:23:13Z

Kay this is ready for review!

Signed-off-by: Adam Li <adam2392@gmail.com>

OmarManzoor

Thanks for the PR @adam2392!

doc/whats_new/v1.4.rst

sklearn/ensemble/_forest.py

… into est_samples

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

sklearn/ensemble/tests/test_forest.py

Signed-off-by: Adam Li <adam2392@gmail.com>

OmarManzoor

Thanks for the updates @adam2392. A few additional comments.

min_dependency_substitutions.rst

sklearn/ensemble/_forest.py

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Signed-off-by: Adam Li <adam2392@gmail.com>

ogrisel · 2023-10-20T12:21:23Z

This attribute is already public on Bagging* models. I don't really see the point in implementing something private that we don't use internally in scikit-learn.

The goal of this attribute is to make it easier to inspect the result of the learning process via public API.

+1 for exposing it as a public attribute.

ogrisel · 2023-10-20T12:22:02Z

@adam2392 maybe you can clarify what kind of use you make of this attribute?

adam2392 · 2023-10-23T15:59:31Z

@adam2392 maybe you can clarify what kind of use you make of this attribute?

@ogrisel I plan on using this attribute for inspecting the training/testing samples used in fitting each individual tree. Right now, this is really hard to do as the fitting the forest process is entirely internal.

More broadly, this would allow someone to implement "honest trees" (ref: #19710) very easily.

fit each tree/forest
for each tree, get out-of-bag samples using the estimators_indices_ property and use those to re-fit the leaves
each tree is now "honest"

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

adam2392 · 2023-10-24T13:19:38Z

Hi I resolved the few docstring comments, but left #26736 (comment) and #26736 (comment) as those seem like they would require changing the behavior in Bagging* too.

Happy to do so though if you think it's warranted.

adam2392 · 2023-10-30T17:55:11Z

Hi @ogrisel just wanted to gently follow-up here to prevent it getting lost.

Is there anything related to the current feature that you think should be still changed?

And a follow-up, do you think the deprecation and renaming of the estimators_samples_ -> training_samples_ API is warranted for a new GH issue for both Bagging* and Forest*?

sklearn/ensemble/_forest.py

sklearn/ensemble/tests/test_forest.py

glemaitre

Otherwise LGTM on my side.

glemaitre

Otherwise LGTM on my side.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2023-11-03T16:46:15Z

sklearn/ensemble/tests/test_forest.py

+
+    assert isinstance(estimator_samples, list)
+    assert len(estimators_samples) == len(estimators)
+    assert estimators_samples[0].dtype == np.int32


Good catch @glemaitre . One part of the if statement led to it being np.int64 instead of np.int32

adam2392 · 2023-11-03T16:47:19Z

Otherwise LGTM on my side.

Thanks for the review! I've addressed the comments left by you. Lmk if there's anything I missed.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre

LGTM. I will merge once the doc build passed.

glemaitre · 2023-11-03T20:58:27Z

The failures were transient due to GitHub.

…rn#26736) Signed-off-by: Adam Li <adam2392@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Adding estimators_samples_

7b7a523

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions bot added the module:ensemble label Jun 30, 2023

adam2392 added 3 commits June 30, 2023 12:14

Add unit test

8714faa

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into est_samples

b7019e2

Get ready for merge

4f049f3

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 marked this pull request as ready for review June 30, 2023 19:21

adam2392 added 7 commits June 30, 2023 12:24

Fix lint

fb35bc6

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix ensemble

496c479

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into est_samples

59fab4f

Merge branch 'main' into est_samples

e60ab1d

Merge branch 'main' into est_samples

341a6bf

Merge branch 'main' into est_samples

f669fdd

Merge branch 'main' into est_samples

ea78263

OmarManzoor reviewed Aug 11, 2023

View reviewed changes

doc/whats_new/v1.4.rst Show resolved Hide resolved

doc/whats_new/v1.4.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

sklearn/ensemble/_forest.py Show resolved Hide resolved

adam2392 and others added 4 commits August 11, 2023 10:13

merge

fa008c6

Merge branch 'est_samples' of https://github.com/neurodata/scikit-learn…

77be379

… into est_samples

Apply suggestions from code review

fc8b499

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Merge branch 'main' into est_samples

da9eb1c

adam2392 requested a review from OmarManzoor August 11, 2023 14:15

OmarManzoor reviewed Aug 11, 2023

View reviewed changes

sklearn/ensemble/tests/test_forest.py Outdated Show resolved Hide resolved

adam2392 added 2 commits August 11, 2023 11:24

Fix

3e38001

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit test

c3d1fce

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested a review from OmarManzoor August 11, 2023 20:36

Merge branch 'main' into est_samples

6d82f72

OmarManzoor reviewed Aug 15, 2023

View reviewed changes

min_dependency_substitutions.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_forest.py Outdated Show resolved Hide resolved

adam2392 and others added 3 commits August 15, 2023 10:45

Update sklearn/ensemble/_forest.py

19590e6

Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

Simpmlify

0dd11df

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into est_samples

0de4488

Merge branch 'main' into est_samples

f9d5e56

adam2392 and others added 2 commits October 24, 2023 09:16

Apply suggestions from code review

b32c9cc

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge branch 'main' into est_samples

72a06e1

adam2392 requested a review from ogrisel October 24, 2023 13:19

adam2392 added 3 commits October 25, 2023 08:49

Merge branch 'main' into est_samples

425f2e7

Merge branch 'main' into est_samples

7d373a3

Merge branch 'main' into est_samples

b776d18

glemaitre self-requested a review November 3, 2023 13:44

glemaitre changed the title ~~[ENH] Adding estimators_samples_ for forest models~~ ENH Adding estimators_samples_ attribute to forest models Nov 3, 2023

glemaitre approved these changes Nov 3, 2023

View reviewed changes

adam2392 and others added 3 commits November 3, 2023 12:40

Apply suggestions from code review

07ae417

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Apply suggestions from code review

bf13e56

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Address glemaitre suggestions

ba0db3e

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 commented Nov 3, 2023

View reviewed changes

Merge branch 'main' into est_samples

43792e7

Merge branch 'main' into est_samples

dbed673

glemaitre self-requested a review November 3, 2023 18:29

Apply suggestions from code review

ab3be64

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre approved these changes Nov 3, 2023

View reviewed changes

glemaitre merged commit 3737909 into scikit-learn:main Nov 3, 2023

Uh oh!

Conversation

adam2392 commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adam2392 commented Jun 30, 2023

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Oct 20, 2023

Uh oh!

ogrisel commented Oct 20, 2023

Uh oh!

adam2392 commented Oct 23, 2023

Uh oh!

adam2392 commented Oct 24, 2023

Uh oh!

adam2392 commented Oct 30, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

adam2392 Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam2392 commented Nov 3, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

adam2392 commented Jun 30, 2023 •

edited

Loading

github-actions bot commented Jun 30, 2023 •

edited

Loading

adam2392 Nov 3, 2023 •

edited

Loading