-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Explanation of nu parameter in One-class SVM #3466
Description
In the latest documentation for Outlier Detection (http://scikit-learn.org/stable/modules/outlier_detection.html), it mentioned an important distinction between novelty detection and outlier detection is that: In novelty detection, "the training data is not polluted by outliers", and in outlier detection, "the training data contains outliers". And in the example, One-class SVM is used to demonstrate novelty detection.
However, in One-class SVM, it is still possible to accept outliers in the training data. Particularly, the parameter nu is used to tune �upper bound on the fraction of outliers in the training dataset (as explained in Proposition 4 of the original paper - Estimating the support of a high-dimensional distribution, by B Schölkopf et al.).
So I think that the distinction between outlier detection and novelty detection is not well-illustrated in the current documentation (by the use of One-class SVM). And in fact, I would think that we should not differentiate between the two cases.
Besides, the current explanation of nu parameter(i.e., "The \nu parameter, also known as the margin of the One-Class SVM, corresponds to the probability of finding a new, but regular, observation outside the frontier.") should be re-written based on the explanation from the original paper to make things clearer.