Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

🎓 Courses

🔴 YouTube

💻 Web Sites

:octocat: GitHub Repositories

Title Description, Information
Statistics/ Mathematical Computing Notebooks General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python

🛠️ Frameworks

Title Description
TensorFlow Probability TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions.
Pyro Deep Universal Probabilistic Programming
ArviZ: Exploratory analysis of Bayesian models ArviZ is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, sample diagnostics, model checking, and comparison.
PyStan PyStan is a Python interface to Stan, a package for Bayesian inference.

💠 Terminology and Methods

🔹 Sampling


🔹 Errors

Keep in mind that there is no silver bullet, no single best error metric. The fundamental challenge is, that every statistical measure condenses a large number of data into a single value, so it only provides one projection of the model errors emphasizing a certain aspect of the error characteristics of the model performance (Chai and Draxler 2014).

Therefore it is better to have a more practical and pragmatic view and work with a selection of metrics that fit for your use case or project.

Title Description, Information
Standard deviation
  • The standard deviation (SD) is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set,. In contrast, a high standard deviation indicates that the values are spread out over a broader range. The SD of predicted values helps in understanding the dispersion of values in different models.
Coefficient of determination, R squared -> R2
Standard Error
  • The standard error of the regression provides the absolute measure of the typical distance that the data points fall from the regression line. S is in the units of the dependent variable.
  • It tells you straight up how precise the model’s predictions are using the units of the dependent variable. You want lower values of S because it signifies that the distances between the data points and the fitted values are smaller.
  • How to Calculate the Standard Error of the Mean in Python
Relative Standard Deviation (RSD) / Coefficient of Variation (CV)
  • The coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of the dispersion of a probability distribution or frequency distribution. It helps us in understanding how the spread is the data in two different tests
Relative Squared Error (RSE)
  • The relative squared error (RSE) is relative to what it would have been if a simple predictor had been used. More specifically, this simple predictor is just the average of the actual values. Thus, the relative squared error takes the total squared error and normalizes it by dividing by the total squared error of the simple predictor. It can be compared between models whose errors are measured in the different units.
Approximation error
  • The approximation error in a data value is the discrepancy between an exact value and some approximation to it. This error can be expressed as an absolute error (the numerical amount of the discrepancy) or as a relative error (the absolute error divided by the data value).

💠 Scale Dependent Metrics:

Title Description, Information
Mean Absolute Error, MAE
  • Measure of errors between paired observations expressing the same phenomenon. Examples of Y versus X include comparisons of predicted versus observed, subsequent time versus initial time, and one technique of measurement versus an alternative technique of measurement
  • How to Calculate Mean Absolute Error in Python
Mean Squared Error (MSE) or Mean Squared Deviation (MSD)
  • An estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate.
  • How to Calculate Mean Squared Error (MSE) in Python
Root-Mean-Square Deviation (RMSD) or Root-Mean-Square error (RMSE)
  • The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed.
  • The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample.
  • The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power.
  • RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.
  • Variations:

  • Normalized Root Mean Squared Error (Norm RMSEP)
Mean Squared Prediction Error (MSPE) or Mean Squared Error of the Predictions (MSEP)
  • In statistics the mean squared prediction error or mean squared error of the predictions of a smoothing or curve fitting procedure is the expected value of the squared difference between the fitted values implied by the predictive function and the values of the (unobservable) function g. It is an inverse measure of the explanatory power of and can be used in the process of cross-validation of an estimated model.
  • Mean Squared Error (MSE) vs. Mean Squared Prediction Error (MSPE)

💠 Percentage Error Metrics:

Title Description, Information
Mean Absolute Percentage Error (MAPE)
  • The mean absolute percentage error (MAPE) is one of the most popular used error metrics in time series forecasting. It is calculated by taking the average (mean) of the absolute difference between actuals and predicted values divided by the actuals.
  • Detailed explanation of MAPE
Symmetric Mean Absolute Percentage Error (sMAPE)
  • To avoid the asymmetry of the MAPE a new error metric was proposed. The Symmetric Mean Absolute Percentage Error (sMAPE). The sMAPE is probably one of the most controversial error metrics, since not only different definitions or formulas exist but also critics claim that this metric is not symmetric as the name suggests.
  • Detailed explanation of sMAPE
Weighted Mean Absolute Percentage Error (wMAPE)

💠 Relative Error Metrics:

Title Description, Information
Relative Absolute Error (RAE)
  • Relative Absolute Error (RAE) is a way to measure the performance of a predictive model. RAE is not to be confused with relative error, which is a general measure of precision or accuracy for instruments like clocks, rulers, or scales. It is expressed as a ratio, comparing a mean error (residual) to errors produced by a trivial or naive model. A good forecasting model will produce a ratio close to zero; A poor model (one that’s worse than the naive model) will produce a ratio greater than one.
  • It is very similar to the relative squared error in the sense that it is also relative to a simple predictor, which is just the average of the actual values. In this case, though, the error is just the total absolute error instead of the total squared error. Thus, the relative absolute error takes the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor.
Median Relative Absolute Error (MdRAE)
  • The Median Relative Absolute Error (MdRAE) calculates the median of the difference between the absolute error of our forecast to the absolute error of a benchmark model.
Geometric Mean Relative Absolute Error (GMRAE)
  • Geometric Mean Relative Absolute Error (GMRAE) compares the errors of our forecast with the one of a defined baseline model. However, instead of calculating the median, the GMRAE, as the name implies, calculates the geometric mean of our relative errors.

💠 Scale-Free Error Metrics:

Title Description, Information
Mean Absolute Scaled Error (MASE)
  • In statistics, the mean absolute scaled error (MASE) is a measure of the accuracy of forecasts. It is the mean absolute error of the forecast values, divided by the mean absolute error of the in-sample one-step naive forecast.

💠 Other Errors

Title Description, Information
Forecast Error (or Residual Forecast Error) The forecast error is calculated as the expected value minus the predicted value. This is called the residual error of the prediction.
Mean Forecast Error (or Forecast Bias) Mean forecast error is calculated as the average of the forecast error values.
Tracking signal Monitors any forecasts that have been made in comparison with actuals, and warns when there are unexpected departures of the outcomes from the forecasts.
Compounding Error

Compounding error is when deviation of one feature, or the process used to measure that feature, directly affects the measurement of another feature.

In the training process, there’s never a chance to see compounding errors. The model is trained to predict the next token based on a human-generated context. If it gets one token wrong by generating a bad distribution, the next token uses the “correct” human generated context independent of the last prediction. During generation it is forced to complete its own automatically-generated context, a setting it has not considered during training.


🔹 Metrics

Time Series Forecast Error Metrics

Title Description, Information
Mean Reciprocal Rank

The mean reciprocal rank is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness.

This metric can help distinguish between answers that were close to being right or far from being right (e.g., a score of 1 if the correct document is rank 1, a score of ½ if rank 2, a score of ⅓ if rank 3, etc.)

📰 Articles

:octocat: GitHub

📄 Papers

🔹 Scale Dependent Metrics

Many popular metrics are referred to as scale-dependent (Hyndman, 2006). Scale-dependent means the error metrics are expressed in the units (i.e. Dollars, Inches, etc.) of the underlying data.

The main advantage of scale dependent metrics is that they are usually easy to calculate and interpret. However, they can not be used to compare different series, because of their scale dependency (Hyndman, 2006).

Please note here that Hyndman (2006) includes Mean Squared Error into a scale-dependent group (claiming that the error is “on the same scale as the data”). However, Mean Squared Error has a dimension of the squared scale/unit. To bring MSE to the data’s unit we need to take the square root which leads to another metric, the RMSE. (Shcherbakov et al., 2013)

🔸 Mean Absolute Error (MAE)

The Mean Absolute Error (MAE) is calculated by taking the mean of the absolute differences between the actual values (also called y) and the predicted values (y_hat).

It is easy to understand (even for business users) and to compute. It is recommended for assessing accuracy on a single series (Hyndman, 2006).

However if you want to compare different series (with different units) it is not suitable. Also you should not use it if you want to penalize outliers.

import numpy as np

def mae(y, y_hat):
    return np.mean(np.abs(y - y_hat))

🔸 Mean Squared Error (MSE)

If you want to put more attention on outliers (huge errors) you can consider the Mean Squared Error (MSE). Like it’s name implies it takes the mean of the squared errors (differences between y and y_hat).

Due to its squaring, it heavily weights large errors more than small ones, which can be in some situations a disadvantage. Therefore the MSE is suitable for situations where you really want to focus on large errors.

Also keep in mind that due to its squaring the metric loses its unit.

import numpy as np

def mse(y, y_hat):
    return np.mean(np.square(y - y_hat))

🔸 Root Mean Squared Error (RMSE)

To avoid the MSE’s loss of its unit we can take the square root of it. The outcome is then a new error metric called the Root Mean Squared Error (RMSE).

It comes with the same advantages as its siblings MAE and MSE. However, like MSE, it is also sensitive to outliers.

Some authors like Willmott and Matsuura (2005) argue that the RMSE is an inappropriate and misinterpreted measure of an average error and recommend MAE over RMSE.

However, Chai and Drexler (2014) partially refuted their arguments and recommend RMSE over MAE for your model optimization as well as for evaluating different models where the error distribution is expected to be Gaussian.

🔹 Percentage Error Metrics

As we know from the previous chapter, scale dependent metrics are not suitable for comparing different time series.

Percentage Error Metrics solve this problem. They are scale independent and used to compare forecast performance between different time series. However, their weak spots are zero values in a time series. Then they become infinite or undefined which makes them not interpretable (Hyndman 2006).

🔸 Mean Absolute Percentage Error (MAPE)

The mean absolute percentage error (MAPE) is one of the most popular used error metrics in time series forecasting. It is calculated by taking the average (mean) of the absolute difference between actuals and predicted values divided by the actuals.

Please note, some MAPE formulas do not multiply the result(s) with 100. However, the MAPE is presented as a percentage unit so I added the multiplication.

MAPE’s advantages are it’s scale-independency and easy interpretability. As said at the beginning, percentage error metrics can be used to compare the outcome of multiple time series models with different scales.

However, MAPE also comes with some disadvantages. First, it generates infinite or undefined values for zero or close-to-zero actual values (Kim and Kim 2016).

Second, it also puts a heavier penalty on negative than on positive errors which leads to an asymmetry (Hyndman 2014).

And last but not least, MAPE can not be used when using percentages make no sense. This is for example the case when measuring temperatures. The units Fahrenheit or Celsius scales have relatively arbitrary zero points, and it makes no sense to talk about percentages (Hyndman and Koehler, 2006).

import numpy as np

def mape(y, y_hat):
    return np.mean(np.abs((y - y_hat)/y)*100)

🔸 Symmetric Mean Absolute Percentage Error (sMAPE)

To avoid the asymmetry of the MAPE a new error metric was proposed. The Symmetric Mean Absolute Percentage Error (sMAPE). The sMAPE is probably one of the most controversial error metrics, since not only different definitions or formulas exist but also critics claim that this metric is not symmetric as the name suggests (Goodwin and Lawton, 1999).

The original idea of an “adjusted MAPE” was proposed by Armstrong (1985). However by his definition the error metric can be negative or infinite since the values in the denominator are not set absolute (which is then correctly mentioned as a disadvantage in some articles that follow his definition).

Makridakis (1993) proposed a similar metric and called it SMAPE. His formula which can be seen below avoids the problems Armstrong’s formula had by setting the values in the denominator to absolute (Hyndman, 2014).

Note: Makridakis (1993) proposed the formula above in his paper “Accuracy measures: theoretical and practical concerns’’. Later in his publication (Makridakis and Hibbon, 2000) “The M3-Competition: results, conclusions and implications’’ he used Armstrong’s formula (Hyndman, 2014). This fact has probably also contributed to the confusion about SMAPE’s different definitions.

The sAMPE is the average across all forecasts made for a given horizon. It’s advantages are that it avoids MAPE’s problem of large errors when y-values are close to zero and the large difference between the absolute percentage errors when y is greater than y-hat and vice versa. Unlike MAPE which has no limits, it fluctuates between 0% and 200% (Makridakis and Hibon, 2000).

For the sake of interpretation there is also a slightly modified version of SMAPE that ensures that the metric’s results will be always between 0% and 100%:

The following code snippet contains the sMAPE metric proposed by Makridakis (1993) and the modified version.

import numpy as np

# SMAPE proposed by Makridakis (1993): 0%-200%
def smape_original(a, f):
    return 1/len(a) * np.sum(2 * np.abs(f-a) / (np.abs(a) + np.abs(f))*100)


# adjusted SMAPE version to scale metric from 0%-100%
def smape_adjusted(a, f):
    return (1/a.size * np.sum(np.abs(f-a) / (np.abs(a) + np.abs(f))*100))

As mentioned at the beginning, there are controversies around the sMAPE. And they are true. Goodwin and Lawton (1999) pointed out that sMAPE gives more penalties to under-estimates more than to over-estimates (Chen et al., 2017). Cánovas (2009) proofs this fact with an easy example.

  • Table 1: Example with a symmetric sMAPE:

  • Table 2: Example with an asymmetric sMAPE:

Starting with Table 1 we have two cases. In case 1 our actual value y is 100 and the prediction y_hat 150. This leads to a sMAPE value of 20 %. Case 2 is the opposite. Here we have an actual value y of 150 and a prediction y_hat of 100. This also leads to a sMAPE of 20 %.

Let us now have a look at Table 2. We also have here two cases and as you can already see the sMAPE values are not the same anymore. The second case leads to a different SMAPE value of 33 %.

Modifying the forecast while holding fixed actual values and absolute deviation do not produce the same sMAPE’s value. Simply biasing the model without improving its accuracy should never produce different error values (Cánovas, 2009).

🔹 Metric/Error choice

  • As you have seen there is no silver bullet, no single best error metric. Each category or metric has its advantages and weaknesses. So it always depends on your individual use case or purpose and your underlying data. It is important not to just look at one single error metric when evaluating your model’s performance. It is necessary to measure several of the main metrics described above in order to analyze several parameters such as deviation, symmetrical deviation and largest outliers.
  • If all series are on the same scale, the data preprocessing procedures were performed (data cleaning, anomaly detection) and the task is to evaluate the forecast performance then the MAE can be preferred because it is simpler to explain (Hyndman and Koehler, 2006; Shcherbakov et al., 2013).
  • Chai and Draxler (2014) recommend to prefer RMSE over MAE when the error distribution is expected to be Gaussian.
  • In case the data contain outliers it is advisable to apply scaled measures like MASE. In this situation the horizon should be large enough, no identical values should be, the normalized factor should be not equal to zero (Shcherbakov et al., 2013).

📰 Articles

🔹 Accuracy (Error) Rate

📰 Articles

🔹 Margin of Error

The margin of error is defined a the range of values below and above the sample statistic in a confidence interval. The confidence interval is a way to show what the uncertainty is with a certain statistic (i.e. from a poll or survey).

A statistical value that determines, with a certain degree of probability, the maximum value by which the results of the sample differ from the results of the general population. It is half the length of the confidence interval.

Предельная ошибка выборки (также предельная погрешность выборки) — статистическая величина, определяющая, с определенной степенью вероятности, максимальное значение, на которое результаты выборки отличаются от результатов генеральной совокупности. Составляет половину длины доверительного интервала.

Examples:

  • For example, a survey indicates that 72% of respondents favor Brand A over Brand B with a 3% margin of error. In this case, the actual population percentage that prefers Brand A likely falls within the range of 72% ± 3%, or 69 – 75%.
  • A margin of error tells you how many percentage points your results will differ from the real population value. For example, a 95% confidence interval with a 4 percent margin of error means that your statistic will be within 4 percentage points of the real population value 95% of the time.

A smaller margin of error suggests that the survey’s results will tend to be close to the correct values. Conversely, larger MOEs indicate that the survey’s estimates can be further away from the population values.

The margin of error is influenced by several factors, including the sample size, variability in the data, and the desired level of confidence. A larger sample size generally results in a smaller margin of error, indicating a more precise estimate. Similarly, a higher level of confidence requires a larger margin of error to account for the increased certainty.

The margin of error provides a measure of the precision and reliability of a sample-based estimate. It helps researchers and analysts interpret and communicate the level of confidence and uncertainty associated with the estimated values.

The code calculates the error (residual) between the actual and predicted values and adds it as a new column 'Error' in the DataFrame. Then, it calculates the mean and standard deviation of the error.

Next, you specify the desired confidence level (e.g., 95% confidence level) and use the stats.norm.ppf function from the scipy.stats module to calculate the critical value based on the confidence level. Finally, the margin of error is computed by multiplying the critical value by the standard error, which is the standard deviation divided by the square root of the number of observations.

import numpy as np
import pandas as pd
import scipy.stats as stats

# Example DataFrame with actual and predicted values
df = pd.DataFrame({'Actual': [10, 15, 20, 25, 30],
                   'Predicted': [12, 18, 22, 28, 32]})

# Calculate the error (residual) between actual and predicted values
df['Error'] = df['Actual'] - df['Predicted']

# Calculate the mean and standard deviation of the error
error_mean = df['Error'].mean()
error_std = df['Error'].std()

# Define the desired confidence level (e.g., 95%)
confidence_level = 0.95

# Calculate the critical value based on the confidence level
z_score = stats.norm.ppf((1 + confidence_level) / 2)

# Calculate the margin of error
margin_of_error = z_score * (error_std / np.sqrt(len(df)))

print('Margin of Error:', margin_of_error)

The formula to calculate the margin of error is: Margin of Error = Critical Value * Standard Error

Here's an example code that calculates the margin of error given a sample size, standard deviation, and confidence level:

import scipy.stats as stats
import math

# Example variables
sample_size = 500
standard_deviation = 0.05
confidence_level = 0.95

# Calculate critical value
z_score = stats.norm.ppf((1 + confidence_level) / 2)  # For a two-tailed test
critical_value = z_score * standard_deviation / math.sqrt(sample_size)

# Calculate margin of error
margin_of_error = critical_value * standard_deviation

print('Margin of Error:', margin_of_error)

In this example, sample_size represents the size of the sample, standard_deviation represents the standard deviation of the population (or an estimate if it is unknown), and confidence_level represents the desired level of confidence (e.g., 0.95 for 95% confidence). The code uses the stats.norm.ppf function from the scipy.stats module to calculate the critical value based on the confidence level. It then multiplies the critical value by the standard deviation divided by the square root of the sample size to calculate the margin of error.

📰 Articles

🔹 Confidence interval

🔺 Confidence limits

🔺 Using the Empirical Rule (95-68-34 or (50-34-14)

📰 Articles


🔹 Standard score

📰 Articles


🔹 Confusion matrix

🛠️ Tools

Title Description, Information
Confusion Matrix in Python Confusion Matrix in Python: plot a pretty confusion matrix (like Matlab) in python using seaborn and matplotlib

🔹 T-test


🔹 Estimation

Title Description
Least squares (Метод найменших квадратів, МНК)

    The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals made in the results of every single equation.

    The most important application is in data fitting (curve fitting). The best fit in the least-squares sense minimizes the sum of squared residuals (a residual being: the difference between an observed value, and the fitted value provided by a model). When the problem has substantial uncertainties in the independent variable (the x variable), then simple regression and least-squares methods have problems; in such cases, the methodology required for fitting errors-in-variables models may be considered instead of that for least squares.

    Least-squares problems fall into two categories: linear or ordinary least squares and nonlinear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. The nonlinear problem is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases.

Least Absolute Deviations (LAD)

    Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute value (LAV), least absolute residual (LAR), sum of absolute deviations, or the L1 norm condition, is a statistical optimality criterion and the statistical optimization technique that relies on it. Similar to the least squares technique, it attempts to find a function which closely approximates a set of data.


💠 Models

🔹 Generalized additive model

📰 Articles


💠 Bayesian Statistics

📚 Bayesian modeling and related books:

  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
  • C.P. Robert: The Bayesian choice (advanced)
  • Gelman, Carlin, Stern, Rubin: Bayesian data analysis (nice easy older book)
  • Congdon: Applied Bayesian modelling; Bayesian statistical modelling (relatively nice books for references)
  • Casella, Robert: Introducing Monte Carlo methods with R (nice book about MCMC)
  • Robert, Casella: Monte Carlo Statistical Methods
  • some parts of Bishop: Pattern recognition and machine learning (very nice book for engineers)
  • Puppy book from Kruschke
  • Mathematical Statistics
  • Think Stats 2e

📰 Articles


💠 Causal Inference

Correlation does not imply causation

🎓 More online lectures, courses, papers, books, etc. on Causality:

📄 Casual Machine Learning (Papers)

🔹 Experimental designs for casual learning:

  • Matching
  • Incident user design
  • Active comparator
  • Instrumental variables estimation
  • Difference-in-differences
  • Regression discontinuity design
  • Modeling