Skip to content

Explanation of nu parameter in One-class SVM #3466

@pvnguyen

Description

@pvnguyen

In the latest documentation for Outlier Detection (http://scikit-learn.org/stable/modules/outlier_detection.html), it mentioned an important distinction between novelty detection and outlier detection is that: In novelty detection, "the training data is not polluted by outliers", and in outlier detection, "the training data contains outliers". And in the example, One-class SVM is used to demonstrate novelty detection.

However, in One-class SVM, it is still possible to accept outliers in the training data. Particularly, the parameter nu is used to tune �upper bound on the fraction of outliers in the training dataset (as explained in Proposition 4 of the original paper - Estimating the support of a high-dimensional distribution, by B Schölkopf et al.).

So I think that the distinction between outlier detection and novelty detection is not well-illustrated in the current documentation (by the use of One-class SVM). And in fact, I would think that we should not differentiate between the two cases.

Besides, the current explanation of nu parameter(i.e., "The \nu parameter, also known as the margin of the One-Class SVM, corresponds to the probability of finding a new, but regular, observation outside the frontier.") should be re-written based on the explanation from the original paper to make things clearer.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions