Weighted Least Squares Method

To understand the weighted least squares method, let us consider the model in matrix format with the basic assumption:

$$Y= X \beta + \varepsilon$$

where $\varepsilon \sim (0, \sigma^2 I)$, the observations are independent from each other.

The variance function $Var(Y|X)$ is the same for all values of the term $X$. This assumption can be related in a number of ways.

Weighted Least Squares Method

Let the simplest multiple regression case

$$E(Y|X = x_i) = \beta’ x_i$$

Assuming the error term as

$$Var(Y|X = x_i) = Var(e_i) = \frac{\sigma^2}{w_i}$$

where $w_1, w_2, \cdots, w_n$ are known positive numbers.

The variance function is still characterized by only one unknown positive number $\sigma^2$, but variances can be different for each case. This will lead to Weighted Least Squares instead of Ordinary Least Squares.

In a standard OLS model, we assume homoscedasticity: the idea that the “noise” or error term is constant across all observations. In the real world, this is often false.

Example: If you are modeling household spending vs. income, wealthier families tend to have much higher variability in their spending than lower-income families. Weighted least squares allows you to “down-weight” the high-variance observations so they do not disproportionately pull the regression line.

Formally,

$$Y=X\beta + e, \qquad \qquad X: n\times p’ \qquad \quad rank\,\,\, p’$$

$$Var(e) = \sigma^2 \Sigma$$

where $\Sigma$ is known and $\sigma^2>0$ not necessarily known.

$$\Sigma = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{13} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{2}^2 & \sigma_{23} & \ddots & \vdots \\ \sigma_{31} & \sigma_{32} & \sigma_3^2 & \ddots & \vdots \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\cdots & \sigma_n^2\end{bmatrix}$$

Once $\hat{\beta}$ is determined, the residuals $\hat{e}$are given by $$\hat{e} = Y – \hat{Y} = Y – X\hat{\beta}$$

The estimator $\hat{\beta}$ is chosen to minimize the generalized residual sum of squares is

\begin{align*}
RR(\beta) &= (Y-X\beta)’W(Y-X\beta)\\
&= \Sigma w_i (y_i – x’_i\beta)^2\\
RSS &= e’\Sigma^{-1} e
\end{align*}

The generalized least squares estimator is

$$\hat{\beta} = (X^t \Sigma ^{-1} X)^{-1} X^t \Sigma^{-1}Y$$

Now suppose that we could find an $n\times n$ matrix $C$ such that $C$ is symmetric and $C’C = CC^t = \Sigma^{-1}$ (and $C^{-1}C^{-t}=\Sigma$. Such a matrix $C$ will be called the square root of $\Sigma^{-1}$.

\begin{align*}
V(Ce) &= C Var(e)C’ \tag*{as $Var(e) = \sigma^2 \Sigma$}\\
&= \sigma^2 C \Sigma C’\\
& = \sigma^2 [CC^{-1}C^{-t}C’] = \sigma^2I_n
\end{align*}

Multiplying both sides of $Y=X\beta+e$ with $C$

\begin{align*}
CY &= CX\beta + Ce\\
Z &= W\beta + d\\
Z &= CY
\end{align*}

where $Z=CY, W=CX, d=Ce$ with $Var(d) = \sigma^2I_n$

Comparison of OLS vs WLS

The following table distinguishes the difference between OLS and WLS

FeatureOrdinary Least SquaresWeighted Least Squares
Variance AssumptionConstant ($\sigma^2 I$)Not Constant ($\sigma^2 \Sigma$)
EfficiencyBest linear unbiased estimator (BLUE) if homoscedasticMore efficient than OLS when heteroscedasticity is present
WeightsAll observations have equal weight($w_i=1$)Observations weighted by $\frac{1}{\sigma_i^2}$

WLS Practical Implementation Steps

One can find the weights, as they are rarely “known” in practice, by following these steps:

  1. Residual Analysis: Run an OLS regression first and plot residuals
  2. Model the Variance: Regress the absolute residuals (or square residuals) against the predictors to estimate the variance function
  3. Calculate the weights: set $w_i = \frac{1}{\hat{\sigma}_i}$
  4. Re-run regression: Perform WLS using the weights computed in step-3.

Note that WLS is a special case of Generalized Least Squares (GLS). It is a powerful tool because it restores the “Best” in BLUE (Best Linear Unbiased Estimator) when the standard OLS assumptions fail.

Weighted Least Squares Method

Weighted Least Squares FAQs

What is the main difference between OLS and WLS?

The primary difference lies in the assumption of variance. Ordinary Least Squares (OLS) assumes that every observation has the same variance (homoscedasticity). Weighted Least Squares (WLS) is used when observations have different variances (heteroscedasticity). WLS assigns a “weight” to each data point, typically $w_i = \frac{1}{\sigma_i^2}$, giving more influence to more precise observations.

When to use the Weighted Least Squares Method instead of the Ordinary Least Squares Method?

One should use WLS when a residual analysis of your OLS model reveals a non-constant variance pattern (e.g., a “fan” or “funnel” shape in a residual plot). If the variance of your errors increases or decreases with the independent variable, OLS is no longer the most efficient estimator, and WLS should be applied.

How do I determine the weights ($w_i$) for many models?

In theoretical exercises, weights are often provided. In practice, one must estimate the weights. The common methods to compute weights include:

  • Prior Knowledge: Using known measurement errors.
  • Residual Modeling: Regressing the absolute or squared residuals from an initial OLS model against the predictor variables to find a variance function.
  • Subgrouping: If data is grouped, using the inverse of the variance within each group.

Is the Weighted Least Squares Method a type of Generalized Least Squares Method?

Yes. The Weighted Least Squares Method (WLS) is a special case of Generalized Least Squares (GLS). While GLS handles cases where errors are both heteroscedastic and correlated (the $\Sigma$ matrix has non-zero off-diagonal elements), WLS specifically deals with cases where errors are uncorrelated but have unequal variances (the $\Sigma$ matrix is diagonal).

Does the Weighted Least Squares Method change the coefficients compared to the Ordinary Least Squares Method?

Yes, the estimated coefficients $\hat{\beta}$ will likely change. Because WLS prioritizes data points with lower variance, the resulting regression line will “tilt” to better fit the most reliable data points. This typically results in smaller standard errors for your coefficients, making your t-tests and p-values more reliable.

Can the Weighted Least Squares Method Handle Outliers?

While WLS can down-weight observations with high variance, it is not inherently a “robust regression” technique for outliers in the way that M-estimation is. If an outlier has a small variance (and thus a high weight), it can actually pull the WLS line even more aggressively than OLS.

MCQs Advanced Statistics

Weighted Least Squares

Weighted Least Squares is primarily a technique within Regression Analysis, which serves as a major part of Statistics and Econometrics. It is an enhancement of the fundamental Ordinary Least Squares (OLS) method. OLS assumes that all data points are equally reliable, but WLS is designed for situations where this is not true. It is also considered a special case of a more general method called Generalized Least Squares (GLS).

Importance of Weighted Least Squares Techniques

The core assumption of OLS is homoscedasticity, meaning the variance of the errors is constant across all levels (or fixed values) of the independent variables. In reality, data often exhibit heteroscedasticity, where the error variance changes. This is where WLS becomes invaluable.

Its primary importance lies in its ability to handle data of varying quality. Instead of treating a precise, low-variance measurement the same as an imprecise, high-variance one, WLS gives each data point a weight that reflects its reliability. This approach offers several key advantages:

  • Increased Efficiency: By giving proper influence to more precise data, WLS produces parameter estimates with the smallest possible variance, making it the Best Linear Unbiased Estimator (BLUE) under heteroscedasticity.
  • More Accurate Estimates: It prevents less reliable data points from skewing results, leading to a model that better represents the true relationship, especially in critical areas such as low-concentration measurements.
  • Valid Inferences: Correctly modeling the error structure allows for more reliable confidence intervals and hypothesis tests.
Weighted Least Squares Techniques

Real-Life Applications and Examples

WLS is applied across a vast range of fields. Here are some compelling examples from the search results:

1. Analytical Chemistry (HPLC Calibration)
In pharmaceutical analysis, accurately measuring low concentrations of impurities is critical. Data from instruments like HPLC-UV often have heteroscedastic noise, where the variability increases with concentration. A blog post from LCGC International demonstrates that using WLS for the calibration curve of the drug carbamazepine reduced the average back-calculation error from 27.7% (with OLS) to just 4.02% (with WLS). This ensures that drug impurities are quantified accurately, which is vital for patient safety.

2. Economics (Consumer Expenditure Surveys)
When studying household consumption, economists often find that spending patterns are more variable for high-income households than for low-income ones. For instance, a high-income family’s spending might fluctuate wildly, while a low-income family’s spending is more consistent. An OLS regression of consumption on income would be inefficient. WLS can be used to give less weight to the highly variable, high-income data points, resulting in a more stable and accurate model of the overall consumption trend.

3. Engineering (Satellite Positioning)
A cutting-edge application from an arXiv research paper involves positioning user terminals using signals from Low Earth Orbit (LEO) satellites. The quality of the signal from each satellite can vary due to interference or changing geometry. The researchers propose a hybrid system where a Deep Reinforcement Learning (DRL) model learns to assign optimal weights to each satellite’s measurement before feeding them into a Weighted Least Squares estimator. This approach achieves sub-meter accuracy (as low as 0.395m RMSE) while keeping computational demands low for the satellite’s onboard systems.

4. Political Science (Analyzing Policy Outcomes)
Political scientists use WLS when the uncertainty of data varies across observations. A classic example is analyzing the proportion of felons incarcerated across different states. The variance of this proportion might be smaller in states with higher average education levels. WLS can account for this, giving more weight to data from states where the measurement (or the underlying process) is more precise.

The Basic Formula and Logic of Weighted Least Squares Techniques

Let us formalize the idea of “Listening to some data more than others.”

  1. The Weights ($w_i$)
    In Weighted Least Squares, every single observation (data point) is given a weight (say $w_i$).
    Weight is inversely related to reliability
    • If a data point has low variance (it is precise and reliable), we give it a high weight
    • If a data point has high variance (it is noisy and unpredictable), we give it a low weight

      Mathematically, the weight $w_i$ is often calculated as
      $$w_i = \frac{1}{\sigma_i^2}$$
      where $\sigma_i^2$ is the variance of the error for that observation. Low $\sigma^2$ leads to high $w_i$.
  2. The Core Weighted Least Squares Goal
    The standard “Least Squares” method minimizes the sum of the squared errors (the distance between the actual data point and the prediction line).
    OLS minimizes $\Sigma e_i^2$ while WLS minimizes $\Sigma w_i e_i^2$
  3. The Weighted Least Squares multiplies each squared error by its specific weight before adding them up. This means that the model works extra hard to make the errors small for the data point with large weights.

Performing Statistical Models in R Language

Mean Deviation from Mean

Mean Deviation from mean (also known as Mean Absolute Deviation or MAD) is a statistical measure that tells you, on average, how far each data point in a set is from the center (usually the mean).

Mean Deviation from Mean

The mean deviation is used to characterize the dispersion among the measures in a given population. To calculate the mean deviation of a set of scores, it is first necessary to compute their average (mean or median) and then specify the distance between each score and that mean without regard to whether the score is above or below (negative and positive) the mean. The mean deviation is defined as the mean of these absolute values.

Mean Deviation from Mean, measure of disperions

Definition of Mean Deviation from Mean

Given a set of numbers and their mean, one can find the difference between each of the numbers and the mean. If we take the mean of these differences, the result is called the mean deviation of the numbers.

Unlike standard deviation, which squares the differences, mean deviation uses absolute values, making it more intuitive and less sensitive to extreme outliers.

Real-Life Uses and Applications of Mean Deviation

Mean deviation is preferred in fields where “average error” needs to be understood in the same units as the data itself.

  • Quality Control: In manufacturing, if a machine fills bottles with liquid, the mean deviation helps technicians understand the average “miss” from the target volume.
  • Supply Chain & Inventory: Companies use MAD to track forecast accuracy. If a warehouse predicts they will sell 100 units but the MAD is 15, they know to keep extra stock to cover that average fluctuation.
  • Climate Science: To describe how much daily temperatures vary from the monthly average without letting one heatwave skew the results too heavily.
  • Finance: It is used to measure investment risk. A high mean deviation in stock prices indicates high volatility, while a low one suggests a “boring” but stable investment.

How to Compute Mean Deviation Manually

To calculate the Mean Deviation ($MD$), follow these four steps:

  1. Find the Mean ($\overline{x}$) of the data.
  2. Subtract the mean from each data point ($x – \overline{x}$).
  3. Take the Absolute Value of those differences ($|x – \overline{x}|$).
  4. Find the Average of those absolute differences.

$$MD = \frac{\sum |x – \overline{x}|}{n}$$

Numerical Example of Calculating Mean Deviation

Consider the data for the computation of Mean Deviation from the mean: 3, 6, 9. The following is the step-by-step computational procedure for calculating mean deviation.

  • Mean: $\frac{3+6+9}{3}= 6$
  • Absolute Differences: $|3-6|=3$, $|6-6|=0$, $|9-6|=3$
  • Sum of Differences: $3 + 0 + 3 = 6$
  • Mean Deviation: $\frac{6}{3} = 2$

Calculating Mean Deviation in Python

In Python, one can calculate mean deviation using the pandas library (most common for data science) or numpy.

import pandas as pd

data = [3, 6, 9, 12, 15]
series = pd.Series(data)

# Calculate Mean Absolute Deviation
mad = (series - series.mean()).abs().mean()

print(f"The Mean Deviation is: {mad}")
import numpy as np

data = np.array([3, 6, 9, 12, 15])
mean = np.mean(data)

# Apply the formula: Average of absolute differences
mad = np.mean(np.absolute(data - mean))

print(f"The Mean Deviation is: {mad}")

Frequently Asked Questions about Mean Deviation

Why do we use absolute values in Mean Deviation?

If we did not use absolute values, the sum of the deviations from the mean would always be zero. This is because the positive differences (values above the mean) and negative differences (values below the mean) perfectly cancel each other out. Absolute values ensure we are measuring the distance from the mean, regardless of direction.

What is the main difference between Mean Deviation and Standard Deviation?

  • Mean Deviation: Uses the absolute value of differences ($|x – \overline{x}|$). It is more intuitive and less affected by extreme outliers.
  • Standard Deviation: Squares the differences ($(x – \overline{x})^2$). This “penalizes” outliers more heavily, making it better for advanced statistical modeling and normal distributions.

Can Mean Deviation ever be negative?

No. Because we use absolute values, each deviation is either positive or zero. Therefore, the average of those deviations (the Mean Deviation) must also be zero or a positive number.

When is Mean Deviation preferred over Standard Deviation?

Mean Deviation is often preferred in Real-World Operations (like supply chain or retail) because it represents the “average error” in the same units as the original data. It is also more robust when dealing with datasets that have a few extreme outliers, so that you don’t want to over-influence your results.

Does Mean Deviation change if we add a constant to every data point?

No. This is a common “trick” question. If you add 10 to every number in a dataset, the Mean also increases by 10. The distance between the points and the mean remains the same, so the Mean Deviation stays constant.

Learn R Programming Language