Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression and even outlier detection tasks. SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection, handwriting identification, face detection and anomaly detection. SVMs are adaptable and efficient in a variety of applications because they can manage high-dimensional data and nonlinear relationships.
- The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the data points in different classes in the feature space.
- The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible.
- The dimension of the hyperplane depends upon the number of features.
- If the number of input features is two, then the hyperplane is just a line.
- If the number of input features is three, then the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds three.
- SVMs are sensitive to the feature scales.
- SVMs are particularly well suited for classification of complex small- or medium-sized datasets.
- SVM is a non-probabilistic binary linear classifier. Wiki
- Although methods such as Platt scaling exist to use SVM in a probabilistic classification setting.Wiki
- Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.
- It is one of the most popular models in Machine Learning, prior to Neural Network for pictures and videos SVM was used intensively.
Let’s consider two independent variables
From the figure above it’s very clear that there are multiple lines (our hyperplane here is a line because we are considering only two input features
One reasonable choice as the best hyperplane is the one that represents the largest separation or margin between the two classes. So we choose the hyperplane whose distance from it to the nearest data point on each side is maximized. If such a hyperplane exists it is known as the maximum-margin hyperplane/hard margin.
A Support Vector Machine (SVM) is a powerful and versatile supervised Machine Learning model, capable of performing linear or nonlinear classification, regression, and even outlier detection.
- SVMs are sensitive to the feature scales.
- SVMs are particularly well suited for classification of complex small- or medium-sized datasets.
- SVM is a [non-probabilistic] binary linear classifier. Wiki
- Although methods such as
Platt scalingexist to use SVM in a probabilistic classification setting.Wiki
- Unlike Logistic Regression classifiers, SVM classifiers do not output probabilities for each class.
- It is one of the most popular models in Machine Learning, prior to Neural Network for pictures and videos SVM was used intensively.
- The following picture is a demonstration why we need a Linear SVM Classifier.
- At the left side we can see some scenarios where linear classification wrongly done.
- The other side is the correct way how linear SVM works, both are on Iris dataset.
- The two classes can clearly be separated easily with a straight line (they are linearly separable).
- The left plot shows the decision boundaries of three possible linear classifiers.
- The model whose decision boundary is represented by the Green dashed line is so bad that it does not even separate the classes properly.
- The other two models work perfectly on this training set, but their decision boundaries come so close to the instances that these models will probably not perform as well on new instances.
- In contrast, the solid line in the plot on the right represents the decision boundary of an SVM classifier; this line not only separates the two classes but also stays as far away from the closest training instances as possible.
- You can think of an SVM classifier as fitting the widest possible street (represented by the parallel dashed lines) between the classes.
- This is called large margin classification or maximum-margin hyperplane
- Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street.
- These instances are called the support vectors (they are circled in the above picture) or Samples on the margin are called the support vectors.
- If the training data is linearly separable, we can select two parallel hyperplanes that separate the two classes of data, so that the distance between them is as large as possible.
- The region bounded by these two hyperplanes is called the "margin"(doted lines) and the maximum-margin hyperplane is the hyperplane that lies halfway between them.
- With a normalized or standardized dataset, these hyperplanes can be described by the equations
Geometrically, the distance between these two hyperplanes is
so to maximize the distance between the planes we want to minimize
. The distance is computed using the distance from a point to a plane equation. We also have to prevent data points from falling into the margin, we add the following constraint: for each {\displaystyle i}i either
- When we strictly impose that all instances must be off the street and on the right side, this is called hard margin classification.
- First, it only works if the data is linearly separable.
- Second, it is sensitive to outliers.
- The objective of the soft margin classification is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e., instances that end up in the middle of the street or even on the wrong side).
- To extend SVM to cases in which the data are not linearly separable, the hinge loss(parameter C) function is helpful
- When creating an SVM model using Scikit-Learn, we can specify a number of hyperparameters. C is one of those hyperparameters.
- If we set it to a low value, then we end up with the model on the left.
- With a high value, we get the model on the right.
- Margin violations are bad.
- It’s usually better to have few of them.
- However, in this case the model on the left has a lot of margin violations but will probably generalize better



