Skip to content

[MRG] ENH iForest - expose warm_start (#13451)#13496

Merged
adrinjalali merged 10 commits intoscikit-learn:masterfrom
pmarko1711:iforest_warmstart
Mar 27, 2019
Merged

[MRG] ENH iForest - expose warm_start (#13451)#13496
adrinjalali merged 10 commits intoscikit-learn:masterfrom
pmarko1711:iforest_warmstart

Conversation

@pmarko1711
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #13451

What does this implement/fix? Explain your changes.

This PR exposes warm_start in the sklearn.ensemble.IsolationForest.
It also adds a test sklearn/ensemble/tests/test_iforest.py to check whether iterative addition of iTrees to an iForest works as expected.

Any other comments?

Shouldn't there be a .. versionadded:: 0.?? added, please? If so, what number?

@adrinjalali
Copy link
Copy Markdown
Member

.. versionadded:: 0.21, and if it doesn't get merged in time, then 0.22.

Also needs an entry in the related whats_new file, like the others.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Yes I think this needs a versionadded for 0.21, as well as an entry in whatsnew linking to this PR Already mentioned by Adrin

* versionadded=0.21

* adition in whatsnew

* test using iris dataset
Copy link
Copy Markdown
Contributor

@albertcthomas albertcthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @petibear

smaller dataset for testing

Co-Authored-By: petibear <40757147+petibear@users.noreply.github.com>
@pmarko1711
Copy link
Copy Markdown
Contributor Author

I don't understand why the two test pipelines fail. Does it have something with my commits?

@NicolasHug
Copy link
Copy Markdown
Member

Try to pull and merge master? I think I had the same CI fail a few days ago

@pmarko1711
Copy link
Copy Markdown
Contributor Author

@NicolasHug Sorry, I'm new to this and don't want to mess things up. Where shall I try to pull and merge master, pls?

@adrinjalali
Copy link
Copy Markdown
Member

add the original repo as a separate remote:

git remote add upstream https://github.com/scikit-learn/scikit-learn.git

fetch the remote

git fetch upstream

merge upstream's branch into your branch (while being in your branch)

git merge upstream/master

@pmarko1711
Copy link
Copy Markdown
Contributor Author

Many thanks, I understand now. There is nothing to be merged with though. For now.

@NicolasHug
Copy link
Copy Markdown
Member

@petibear You're not responsible for the CI fails, I opened #13506

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I'm not sure if we should parametrize the test_warm_start_* tests in test_bagging.py with IForest now, or replicate them in test_iforest.py, or consider that the current test_bagging.py (which uses BaggingClassifier) is enough anyway. @adrinjalali ?

@NicolasHug
Copy link
Copy Markdown
Member

@petibear Since #13506 is merged now you can try committing an empty commit to trigger the CI again (or just address the review above of course)

git commit --allow-empty -m "Trigger CI"

Checks should go green

@adrinjalali
Copy link
Copy Markdown
Member

IMO it's pretty clear from the docstring that warm_start affects n_estimators, and the link to the docstring is also helpful

It's more like if you know what it's supposed to do, and you read the docstring, then it's very clear that it conveys the message. But it's cryptic enough that if you don't know, then you may not understand how it works. Maybe add it to doc/modules/ensemble.rst where we only have gradient_boosting_warm_start would help?

As for the tests, I'm okay with what the PR has, since for the rest tests are done in the parent class. No strong feelings there.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @adrinjalali that the parameter description could be improved

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

@pmarko1711
Copy link
Copy Markdown
Contributor Author

I could then draft an addition to doc/modules/ensemble.rst (i.e. to its Bagging meta-estimator section) to say that classes derived from BaseBagging (BaggingClassifier, BaggingRegressor as well as IsolationForest and I guess that's all) support incremental addition of estimators via warm_start. It could have helped me when I encountered the issue. I would also add a short snippet for iForest (?).

Furthermore, since the text would then specifically refer to IsolationForest, I think the sentence "As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees), in contrast with boosting methods which usually work best with weak models (e.g., shallow decision trees)." should be also changed or a further clarification should be added. Isolation forest deliberately uses shallower trees.

@NicolasHug
Copy link
Copy Markdown
Member

I don't think it's relevant to add anything to ensemble.rst since this section of the doc belongs to supervised learning while IForest belongs to unsupervised learning. There is currently no mention of IForest in ensemble.rst, rightfully so.

(For the same reason the sentence about shallow trees doesn't need an update IMO).

I would suggest to update instead the isolation forest section of outlier_detection.rst and write a small example here using warm_start, pretty much like what ensemble.rst/gradient_boosting is already doing.

@pmarko1711
Copy link
Copy Markdown
Contributor Author

pmarko1711 commented Mar 25, 2019

Fair enough, I've added an entry to outlier_detection.rst. I struggled a bit where to place it (and whether to create a subsection). I decided for a subsection under the reference to the paper as not to make it too prominent, but you might want it different.

Is there anything else to be done here?

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if all goes green

Copy link
Copy Markdown
Contributor

@albertcthomas albertcthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @petibear

@adrinjalali
Copy link
Copy Markdown
Member

could you please merge master again and fix the merge conflicts? You're getting unlucky on this one.

@pmarko1711
Copy link
Copy Markdown
Contributor Author

That's not a problem, at least I learn the process.
I'm now in the office, will do it later this night. Will also move comments in the example in outlier_detection.rst.

@pmarko1711
Copy link
Copy Markdown
Contributor Author

ok, merged, pushed, hopefully all good now

@adrinjalali adrinjalali merged commit 49cdee6 into scikit-learn:master Mar 27, 2019
@adrinjalali
Copy link
Copy Markdown
Member

Thanks @petibear

@pmarko1711
Copy link
Copy Markdown
Contributor Author

pmarko1711 commented Mar 27, 2019

Many thanks too for all the help, @adrinjalali & the rest of you guys. Happy to have contributed and to have learnt something here

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
)

* ENH iForest - expose warm_start (scikit-learn#13451)

* Incorporates comments from PR scikit-learn#13496

* versionadded=0.21

* adition in whatsnew

* test using iris dataset

* Update sklearn/ensemble/tests/test_iforest.py

smaller dataset for testing

Co-Authored-By: petibear <40757147+petibear@users.noreply.github.com>

* Trigger CI

* Corrected the PR reference

* doc entry on warm_start + renamed the test

* Corrections in the doc example

* comments made inline in the doc example
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
)

* ENH iForest - expose warm_start (scikit-learn#13451)

* Incorporates comments from PR scikit-learn#13496

* versionadded=0.21

* adition in whatsnew

* test using iris dataset

* Update sklearn/ensemble/tests/test_iforest.py

smaller dataset for testing

Co-Authored-By: petibear <40757147+petibear@users.noreply.github.com>

* Trigger CI

* Corrected the PR reference

* doc entry on warm_start + renamed the test

* Corrections in the doc example

* comments made inline in the doc example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose warm_start in Isolation forest

5 participants