To understand the weighted least squares method, let us consider the model in matrix format with the basic assumption:
$$Y= X \beta + \varepsilon$$
where $\varepsilon \sim (0, \sigma^2 I)$, the observations are independent from each other.
Table of Contents
The variance function $Var(Y|X)$ is the same for all values of the term $X$. This assumption can be related in a number of ways.
Weighted Least Squares Method
Let the simplest multiple regression case
$$E(Y|X = x_i) = \beta’ x_i$$
Assuming the error term as
$$Var(Y|X = x_i) = Var(e_i) = \frac{\sigma^2}{w_i}$$
where $w_1, w_2, \cdots, w_n$ are known positive numbers.
The variance function is still characterized by only one unknown positive number $\sigma^2$, but variances can be different for each case. This will lead to Weighted Least Squares instead of Ordinary Least Squares.
In a standard OLS model, we assume homoscedasticity: the idea that the “noise” or error term is constant across all observations. In the real world, this is often false.
Example: If you are modeling household spending vs. income, wealthier families tend to have much higher variability in their spending than lower-income families. Weighted least squares allows you to “down-weight” the high-variance observations so they do not disproportionately pull the regression line.
Formally,
$$Y=X\beta + e, \qquad \qquad X: n\times p’ \qquad \quad rank\,\,\, p’$$
$$Var(e) = \sigma^2 \Sigma$$
where $\Sigma$ is known and $\sigma^2>0$ not necessarily known.
$$\Sigma = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \sigma_{13} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_{2}^2 & \sigma_{23} & \ddots & \vdots \\ \sigma_{31} & \sigma_{32} & \sigma_3^2 & \ddots & \vdots \\ \vdots & \vdots & \vdots & \cdots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots &\cdots & \sigma_n^2\end{bmatrix}$$
Once $\hat{\beta}$ is determined, the residuals $\hat{e}$are given by $$\hat{e} = Y – \hat{Y} = Y – X\hat{\beta}$$
The estimator $\hat{\beta}$ is chosen to minimize the generalized residual sum of squares is
\begin{align*}
RR(\beta) &= (Y-X\beta)’W(Y-X\beta)\\
&= \Sigma w_i (y_i – x’_i\beta)^2\\
RSS &= e’\Sigma^{-1} e
\end{align*}
The generalized least squares estimator is
$$\hat{\beta} = (X^t \Sigma ^{-1} X)^{-1} X^t \Sigma^{-1}Y$$
Now suppose that we could find an $n\times n$ matrix $C$ such that $C$ is symmetric and $C’C = CC^t = \Sigma^{-1}$ (and $C^{-1}C^{-t}=\Sigma$. Such a matrix $C$ will be called the square root of $\Sigma^{-1}$.
\begin{align*}
V(Ce) &= C Var(e)C’ \tag*{as $Var(e) = \sigma^2 \Sigma$}\\
&= \sigma^2 C \Sigma C’\\
& = \sigma^2 [CC^{-1}C^{-t}C’] = \sigma^2I_n
\end{align*}
Multiplying both sides of $Y=X\beta+e$ with $C$
\begin{align*}
CY &= CX\beta + Ce\\
Z &= W\beta + d\\
Z &= CY
\end{align*}
where $Z=CY, W=CX, d=Ce$ with $Var(d) = \sigma^2I_n$
Comparison of OLS vs WLS
The following table distinguishes the difference between OLS and WLS
| Feature | Ordinary Least Squares | Weighted Least Squares |
|---|---|---|
| Variance Assumption | Constant ($\sigma^2 I$) | Not Constant ($\sigma^2 \Sigma$) |
| Efficiency | Best linear unbiased estimator (BLUE) if homoscedastic | More efficient than OLS when heteroscedasticity is present |
| Weights | All observations have equal weight($w_i=1$) | Observations weighted by $\frac{1}{\sigma_i^2}$ |
WLS Practical Implementation Steps
One can find the weights, as they are rarely “known” in practice, by following these steps:
- Residual Analysis: Run an OLS regression first and plot residuals
- Model the Variance: Regress the absolute residuals (or square residuals) against the predictors to estimate the variance function
- Calculate the weights: set $w_i = \frac{1}{\hat{\sigma}_i}$
- Re-run regression: Perform WLS using the weights computed in step-3.
Note that WLS is a special case of Generalized Least Squares (GLS). It is a powerful tool because it restores the “Best” in BLUE (Best Linear Unbiased Estimator) when the standard OLS assumptions fail.
Weighted Least Squares FAQs
What is the main difference between OLS and WLS?
The primary difference lies in the assumption of variance. Ordinary Least Squares (OLS) assumes that every observation has the same variance (homoscedasticity). Weighted Least Squares (WLS) is used when observations have different variances (heteroscedasticity). WLS assigns a “weight” to each data point, typically $w_i = \frac{1}{\sigma_i^2}$, giving more influence to more precise observations.
When to use the Weighted Least Squares Method instead of the Ordinary Least Squares Method?
One should use WLS when a residual analysis of your OLS model reveals a non-constant variance pattern (e.g., a “fan” or “funnel” shape in a residual plot). If the variance of your errors increases or decreases with the independent variable, OLS is no longer the most efficient estimator, and WLS should be applied.
How do I determine the weights ($w_i$) for many models?
In theoretical exercises, weights are often provided. In practice, one must estimate the weights. The common methods to compute weights include:
- Prior Knowledge: Using known measurement errors.
- Residual Modeling: Regressing the absolute or squared residuals from an initial OLS model against the predictor variables to find a variance function.
- Subgrouping: If data is grouped, using the inverse of the variance within each group.
Is the Weighted Least Squares Method a type of Generalized Least Squares Method?
Yes. The Weighted Least Squares Method (WLS) is a special case of Generalized Least Squares (GLS). While GLS handles cases where errors are both heteroscedastic and correlated (the $\Sigma$ matrix has non-zero off-diagonal elements), WLS specifically deals with cases where errors are uncorrelated but have unequal variances (the $\Sigma$ matrix is diagonal).
Does the Weighted Least Squares Method change the coefficients compared to the Ordinary Least Squares Method?
Yes, the estimated coefficients $\hat{\beta}$ will likely change. Because WLS prioritizes data points with lower variance, the resulting regression line will “tilt” to better fit the most reliable data points. This typically results in smaller standard errors for your coefficients, making your t-tests and p-values more reliable.
Can the Weighted Least Squares Method Handle Outliers?
While WLS can down-weight observations with high variance, it is not inherently a “robust regression” technique for outliers in the way that M-estimation is. If an outlier has a small variance (and thus a high weight), it can actually pull the WLS line even more aggressively than OLS.


