ENH Adding estimators_samples_ attribute to forest models#26736
ENH Adding estimators_samples_ attribute to forest models#26736glemaitre merged 51 commits intoscikit-learn:mainfrom
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
|
Kay this is ready for review! |
Signed-off-by: Adam Li <adam2392@gmail.com>
OmarManzoor
left a comment
There was a problem hiding this comment.
Thanks for the PR @adam2392!
… into est_samples
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
OmarManzoor
left a comment
There was a problem hiding this comment.
Thanks for the updates @adam2392. A few additional comments.
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
|
This attribute is already public on The goal of this attribute is to make it easier to inspect the result of the learning process via public API. +1 for exposing it as a public attribute. |
|
@adam2392 maybe you can clarify what kind of use you make of this attribute? |
@ogrisel I plan on using this attribute for inspecting the training/testing samples used in fitting each individual tree. Right now, this is really hard to do as the fitting the forest process is entirely internal. More broadly, this would allow someone to implement "honest trees" (ref: #19710) very easily.
|
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
|
Hi I resolved the few docstring comments, but left #26736 (comment) and #26736 (comment) as those seem like they would require changing the behavior in Happy to do so though if you think it's warranted. |
|
Hi @ogrisel just wanted to gently follow-up here to prevent it getting lost. Is there anything related to the current feature that you think should be still changed? And a follow-up, do you think the deprecation and renaming of the |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
|
|
||
| assert isinstance(estimator_samples, list) | ||
| assert len(estimators_samples) == len(estimators) | ||
| assert estimators_samples[0].dtype == np.int32 |
There was a problem hiding this comment.
Good catch @glemaitre . One part of the if statement led to it being np.int64 instead of np.int32
Thanks for the review! I've addressed the comments left by you. Lmk if there's anything I missed. |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. I will merge once the doc build passed.
|
The failures were transient due to GitHub. |
…rn#26736) Signed-off-by: Adam Li <adam2392@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes: #26716
What does this implement/fix? Explain your changes.
estimators_samples_property to theBaseForestclass, which allows any forest-method to recompute the indices of the training samples per tree in the forestAny other comments?
Note this is essentially very similar to
Bagging*except we only care about the samples considered rather than samples and feature indices.This is useful for example for potentially allowing the trees to get closer to enabling things like i) honesty and ii) analysis of the trained samples used for the tree.