SVM

$\Large{\color{Purple}\textbf{Index}}$

⬛ $\Large{\color{blue}\underline{\mathcal{Support\ Vector\ Machine(SVM)}}}$

Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression and even outlier detection tasks. SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection, handwriting identification, face detection and anomaly detection. SVMs are adaptable and efficient in a variety of applications because they can manage high-dimensional data and nonlinear relationships.

The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space.
The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible.
The dimension of the hyperplane depends upon the number of features.
If the number of input features is two, then the hyperplane is just a line.
If the number of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds three.
SVMs are sensitive to the feature scales.
SVMs are particularly well suited for classification of complex small- or medium-sized datasets.
SVM is a non-probabilistic binary linear classifier. Wiki
Although methods such as Platt scaling exist to use SVM in a probabilistic classification setting.Wiki
Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.
It is one of the most popular models in Machine Learning, prior to Neural Network for pictures and videos SVM was used intensively.

🔲 $\large{\color{Purple}\underline{\textrm{Linear }}}$

Let’s consider two independent variables $\large{\color{Purple}x_1, x_2}$ and one dependent variable which is either a red circle or a red cross.

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because we are considering only two input features $\large{\color{Purple}x_1, x_2}$) that segregate our data points or do a classification between red circle and red cross. So how do we choose the best line or in general the best hyperplane that segregates our data points?

🔲 $\large{\color{Purple}\underline{\textrm{How SVM Works}}}$

One reasonable choice as the best hyperplane is the one that represents the largest separation or margin between the two classes. So we choose the hyperplane whose distance from it to the nearest data point on each side is maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard margin.

Definition

A Support Vector Machine (SVM) is a powerful and versatile supervised Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection.

SVMs are sensitive to the feature scales.

SVMs are particularly well suited for classification of complex small- or medium-sized datasets.

SVM is a [non-probabilistic] binary linear classifier. Wiki

Although methods such as Platt scaling exist to use SVM in a probabilistic classification setting.Wiki

Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.
It is one of the most popular models in Machine Learning, prior to Neural Network for pictures and videos SVM was used intensively.

Linear SVM Classification

The following picture is a demonstration why we need a Linear SVM Classifier.
At the left side we can see some scenarios where linear classification wrongly done.
The other side is the correct way how linear SVM works, both are on Iris dataset.

Large Margin Classification

The two classes can clearly be separated easily with a straight line (they are linearly separable).

Left side:

The left plot shows the decision boundaries of three possible linear classifiers.
The model whose decision boundary is represented by the Green dashed line is so bad that it does not even separate the classes properly.
The other two models work perfectly on this training set, but their decision boundaries come so close to the instances that these models will probably not perform as well on new instances.

Right side:

In contrast, the solid line in the plot on the right represents the decision boundary of an SVM classifier; this line not only separates the two classes but also stays as far away from the closest training instances as possible.
You can think of an SVM classifier as fitting the widest possible street (represented by the parallel dashed lines) between the classes.
This is called large margin classification or maximum-margin hyperplane
Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street.
These instances are called the support vectors (they are circled in the above picture) or Samples on the margin are called the support vectors.

Hard Margin Classification

If the training data is linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible.
The region bounded by these two hyperplanes is called the "margin"(doted lines) and the maximum-margin hyperplane is the hyperplane that lies halfway between them.
With a normalized or standardized dataset, these hyperplanes can be described by the equations

Any hyperplane can be written as the set of points x satisfying - w^Tx - b= 0

With a normalized or standardized dataset, these hyperplanes can be described by the equations-

Anything on or above this boundary is of one class, with label 1 - w^Tx - b= 1

Anything on or below this boundary is of the other class, with label −1 w^Tx - b= -1

Geometrically, the distance between these two hyperplanes is so to maximize the distance between the planes we want to minimize . The distance is computed using the distance from a point to a plane equation. We also have to prevent data points from falling into the margin, we add the following constraint: for each {\displaystyle i}i either

When we strictly impose that all instances must be off the street and on the right side, this is called hard margin classification.

There are two main issues with hard margin classification.

First, it only works if the data is linearly separable.
Second, it is sensitive to outliers.

Soft Margin Classification

The objective of the soft margin classification is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side).
To extend SVM to cases in which the data are not linearly separable, the hinge loss(parameter C) function is helpful

If your SVM model is overfitting, you can try regularizing it by reducing C.

Hyperparameters

When creating an SVM model using Scikit-Learn, we can specify a number of hyperparameters. C is one of those hyperparameters.
If we set it to a low value, then we end up with the model on the left.
With a high value, we get the model on the right.
Margin violations are bad.
- It’s usually better to have few of them.
- However, in this case the model on the left has a lot of margin violations but will probably generalize better

Name		Name	Last commit message	Last commit date
parent directory ..
Linear SVM.ipynb		Linear SVM.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

$\Large{\color{Purple}\textbf{Index}}$

⬛ $\Large{\color{blue}\underline{\mathcal{Support\ Vector\ Machine(SVM)}}}$

🔲 $\large{\color{Purple}\underline{\textrm{Linear }}}$

🔲 $\large{\color{Purple}\underline{\textrm{How SVM Works}}}$

Definition

A Support Vector Machine (SVM) is a powerful and versatile supervised Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection.

Linear SVM Classification

Large Margin Classification

Left side:

Right side:

Hard Margin Classification

Any hyperplane can be written as the set of points x satisfying - w^Tx - b= 0

With a normalized or standardized dataset, these hyperplanes can be described by the equations-

Anything on or above this boundary is of one class, with label 1 - w^Tx - b= 1

Anything on or below this boundary is of the other class, with label −1 w^Tx - b= -1

There are two main issues with hard margin classification.

Soft Margin Classification

If your SVM model is overfitting, you can try regularizing it by reducing C.

Hyperparameters

Nonlinear SVM Classification

SVM Regression

References

FilesExpand file tree

SVM

Directory actions

More options

Directory actions

More options

Latest commit

History

SVM

Folders and files

parent directory

README.md

$\Large{\color{Purple}\textbf{Index}}$

⬛ $\Large{\color{blue}\underline{\mathcal{Support\ Vector\ Machine(SVM)}}}$

🔲 $\large{\color{Purple}\underline{\textrm{Linear }}}$

🔲 $\large{\color{Purple}\underline{\textrm{How SVM Works}}}$

Definition

A Support Vector Machine (SVM) is a powerful and versatile supervised Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection.

Linear SVM Classification

Large Margin Classification

Left side:

Right side:

Hard Margin Classification

Any hyperplane can be written as the set of points x satisfying - wTx - b= 0

With a normalized or standardized dataset, these hyperplanes can be described by the equations-

Anything on or above this boundary is of one class, with label 1 - wTx - b= 1

Anything on or below this boundary is of the other class, with label −1 wTx - b= -1

There are two main issues with hard margin classification.

Soft Margin Classification

If your SVM model is overfitting, you can try regularizing it by reducing C.

Hyperparameters

Nonlinear SVM Classification

SVM Regression

References

Any hyperplane can be written as the set of points x satisfying - w^Tx - b= 0

Anything on or above this boundary is of one class, with label 1 - w^Tx - b= 1

Anything on or below this boundary is of the other class, with label −1 w^Tx - b= -1