Skip to content

Latest commit

 

History

History

README.md

Index

grape

Supervised Learning

plum

What is Supervised Learning?

Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label.

The goal of supervised learning algorithms is learning a function that maps feature vectors (inputs) to labels (output), based on example input-output pairs. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).

A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.

An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error.


Train-test Split of data

Classification and Regresssion

Regression

plum

Fitting the Data

Add supporting information to the text, including equations, images, and hyperlinks.

Let's try fitting the data with polynomial regression. We'll use the MATLAB polyfit function to get the coefficients.

We will see later how this relates perfectly with our linear regression procedure.

The fit equations are:

$$\large{\color{Purple} \begin{cases} linear & y = w_1x+w_0 \\ quadratic & y = w_2x^2+w_1x+w_0 \\ cubic & y = w_3x^3+w_2x^2+w_1x+w_0 \\ \end{cases} } $$

Example:

--- * Supervised Learning is where you have a input variable(x) and a output variable(y) and you use a mapping function from the input to the output * It is called supervised because the process of an algorithm training fom the dataset can be thought as a teacher supervising the learning process.

Data point

plum

We start with the case where there is a single input and a single output

  • $\large{\color{Purple} (x^{(i)}, y^{(i)})} : i^{th} \textit{ example of (input, output) set.}$
  • $\large{\color{Purple}m}$ : Number of examples or data points.
  • We assume that there are 51 pairs of data points.

Hypothesis

The general univariate linear regression problem We now introduce our model hypothesis Linear Model-

$$ \large{\color{Purple} \hat{y}= h(x)=w_{0}+ w_{1}x} $$

  • $\large{\color{Purple} w_0, w_1}$ are parameters and there are infinite possibilities. Which do we choose?
  • For this we defined a cost function $\large{\color{Purple} J=\frac{1}{2m} \sum_{i}(y^{(i)}- \hat{y}^{(i)})^2}$
    • Notice that no line is going to fit all of this perfectly so the difference between the points $\large{\color{Purple}y}$ and the pont resides on the regression line $\large{\color{Purple}\hat{y}}$ is the loss.

How would we achieve our optimal $\large{\color{Purple} w}$ ?.

  • So we say that the optimal $\large{\color{Purple} w}$ (so you can now notice it has now become optimization problem) is the one which minimizes this net cost function.
  • So the $\large{\color{Purple} w}$ that we get at the end of the process will be called Least Square Coefficient
  • This fit is called Least Square fit and the cost function is called Least Mean Squre (LMS)

Measuring the Fit

plum

Mean Square Error:

$$\large{\color{Purple} J=\frac{1}{2m} \sum_{i}(y^{(i)}- \hat{y}^{(i)})^2}$$

So this is one measure of how good the fit is. Sometimes this is not a good enough, sometimes we just get a large value of J and we don’t know whether this is a good fit or not.

Varience

$$\large{\color{Purple}\sum^{m}_{i=1} (y_i-\bar{y})^2} {\color{Cyan}\textrm{(SST) Sum Square Total}}$$

What does total variance mean? Before we even had a model there was some amount of variation in the data, and this term actually calculates the total amount of variance in the data before we even had a model.

Amount of varience present in the data

Error

$$\large{\color{Purple}\sum^{m}_{i=1} (y_i-\hat{y_i})^2} {\color{Cyan}\textrm{(SSE) Sum Square Error}}$$

Here yi is the Ground Truth and the &hat; yi is the prediction or Hypothesis or Model.

Varience in Prediction

$$\large{\color{Purple}\sum^{m}_{i=1} (\hat{y_i} - \bar{y})^2} {\color{Cyan}\textrm{(SSR) Sum Square Regression}}$$

Amount of varience captured by the model.

$\large R^2$ Error

$$\large{\color{Purple}\mathbf{R^2} = \frac{\textrm{SSR}}{\textrm{SST}} = \frac{\textit{Amount of varience captured by the Model}}{\textit{Amount of varience present in Data}} }$$

Typically we would like one number which lie between 0 and 1. So we want to normalize this. You have a number, you would like to non-dimensionalize it, normalize it with respect to some denominator, so that you get an idea between 0 and 1.

$$\large{\color{Purple} \mathbf{R^2 \in \Bigl[0, 1 \Bigl] } } \normalsize{\color{Cyan} \begin{cases} 0 &= Very\ Bad\ fit \\ 1 &= Very\ Good \ fit \end{cases}} $$

Conclusion

$$ \large{\color{Purple} \begin{align*} \because & \textbf{SST} = \textbf{SSE + SSR} \\ \Rightarrow & \textbf{SSR} = \textbf{SST-SSE} \\ \Rightarrow & \mathbf{R^2} = \mathbf{1- \frac{SSE}{SST}} \end{align*} } $$

Supervised Learning Catagories - Based on Types

plum

Machine Learning Catagories - Based on Outliers handling

plum