-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Closed
Description
Description
I think we could improve the consistency of the decision_function of the outlier detection algorithms implemented in scikit-learn.
decision_functionfor OCSVM is such that if the value is positive then the sample is an inlier and if negative then it is an outlier. It takes into account the parameternuwhich can be seen as a contamination parameter. Thedecision_functionof IsolationForest does not take into account thecontaminationparameter, it just returns the score of the samples. For LOF, it is private_decision_functionand does not take into account the contamination parameter. For EllipticEnveloppe,decision_functiontakes into account the contamination parameter and it is said in the documentation that it is meant to "ensure a compatibility with other outlier detection tools such as the One-Class SVM".
decision_function should maybe stick with the OCSVM convention and we could add a score_samples method, as for kernel density estimation, which would return the scores of the algorithms as defined in their original papers. This would be useful when performing benchmarks with ROC curves for instance. When I did a benchmark with sklearn anomaly detection algorithms I defined a subclass for each algorithm, each with a score method.
If you think this should be adressed I can submit a PR.
See also #8677.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels