- Supervised Learning
- Supervised Learning Catagories
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label.
The goal of supervised learning algorithms is learning a function that maps feature vectors (inputs) to labels (output), based on example input-output pairs. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).
A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error.
Add supporting information to the text, including equations, images, and hyperlinks.
Let's try fitting the data with polynomial regression. We'll use the MATLAB polyfit function to get the coefficients.
We will see later how this relates perfectly with our linear regression procedure.
We start with the case where there is a single input and a single output
$\large{\color{Purple} (x^{(i)}, y^{(i)})} : i^{th} \textit{ example of (input, output) set.}$ -
$\large{\color{Purple}m}$ : Number of examples or data points. - We assume that there are 51 pairs of data points.
The general univariate linear regression problem We now introduce our model hypothesis Linear Model-
-
$\large{\color{Purple} w_0, w_1}$ are parameters and there are infinite possibilities. Which do we choose? - For this we defined a cost function
$\large{\color{Purple} J=\frac{1}{2m} \sum_{i}(y^{(i)}- \hat{y}^{(i)})^2}$ - Notice that no line is going to fit all of this perfectly so the difference between the points
$\large{\color{Purple}y}$ and the pont resides on the regression line$\large{\color{Purple}\hat{y}}$ is the loss.
- Notice that no line is going to fit all of this perfectly so the difference between the points
- So we say that the optimal
$\large{\color{Purple} w}$ (so you can now notice it has now become optimization problem) is the one which minimizes this net cost function. - So the
$\large{\color{Purple} w}$ that we get at the end of the process will be called Least Square Coefficient - This fit is called Least Square fit and the cost function is called Least Mean Squre (LMS)
Mean Square Error:
So this is one measure of how good the fit is. Sometimes this is not a good enough, sometimes we just get a large value of J and we don’t know whether this is a good fit or not.
What does total variance mean? Before we even had a model there was some amount of variation in the data, and this term actually calculates the total amount of variance in the data before we even had a model.
Amount of varience present in the data
Here yi is the Ground Truth and the &hat; yi is the prediction or Hypothesis or Model.
Amount of varience captured by the model.
Typically we would like one number which lie between 0 and 1. So we want to normalize this. You have a number, you would like to non-dimensionalize it, normalize it with respect to some denominator, so that you get an idea between 0 and 1.









