How to Perform Grubbs Test in Python

The Grubbs test is a statistical hypothesis testing method to detect outliers in a dataset. Outliers are observations that disturb the data distribution and can cause models to overfit. This article explains what the Grubbs test is and demonstrates how to implement it in Python using both built-in libraries and manual formula implementation.

What are Outliers?

Outliers are data points that are numerically distant from other observations in the dataset. For normally distributed data, approximately 68% of records should fall within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations of the mean. Data points that fall outside the first and third quartile range are typically considered outliers.

Grubbs Statistical Hypothesis Test

The Grubbs test detects outliers by testing statistical hypotheses. It works with univariate datasets that follow an approximately normal distribution and contain at least seven observations. This test is also known as the extreme studentized deviation test or maximum normalized residual test.

The Grubbs test uses the following hypotheses ?

  • Null (H0): The dataset has no outliers

  • Alternate (H1): The dataset has exactly one outlier

The test can be performed as either a Two-Sided Test (detecting outliers on both ends) or a One-Sided Test (detecting outliers on one end only).

Using the outliers Library

Python provides the outliers library with built-in functions for performing the Grubbs test. First, install the library ?

!pip install outliers

Two-Sided Grubbs Test

The two-sided test detects outliers from both the minimum and maximum sides of the dataset ?

import numpy as np
from outliers import smirnov_grubbs as grubbs

# Define sample data with an outlier
data = np.array([5, 14, 15, 15, 14, 19, 17, 16, 20, 22, 8, 21, 28, 11, 9, 29, 40])

# Perform two-sided Grubbs test
result = grubbs.test(data, alpha=0.05)
print("Original data:", data)
print("After removing outliers:", result)
Original data: [ 5 14 15 15 14 19 17 16 20 22  8 21 28 11  9 29 40]
After removing outliers: [ 5 14 15 15 14 19 17 16 20 22  8 21 28 11  9 29]

One-Sided Grubbs Test

The one-sided test detects outliers from either the minimum side using min_test() or maximum side using max_test() ?

import numpy as np
from outliers import smirnov_grubbs as grubbs

data = np.array([5, 14, 15, 15, 14, 19, 17, 16, 20, 22, 8, 21, 28, 11, 9, 29, 40])

# Test for minimum outliers
min_result = grubbs.min_test(data, alpha=0.05)
print("Min test result:", min_result)

# Test for maximum outliers  
max_result = grubbs.max_test(data, alpha=0.05)
print("Max test result:", max_result)
Min test result: [ 5 14 15 15 14 19 17 16 20 22  8 21 28 11  9 29 40]
Max test result: [ 5 14 15 15 14 19 17 16 20 22  8 21 28 11  9 29]

Manual Formula Implementation

You can also implement the Grubbs test manually using the mathematical formula. The test statistic is calculated as ?

G = max|xi - x?| s
import numpy as np
import scipy.stats as stats

def grubbs_test(data):
    n = len(data)
    mean_x = np.mean(data)
    sd_x = np.std(data, ddof=1)  # Sample standard deviation
    
    # Calculate test statistic
    numerator = max(abs(data - mean_x))
    g_calculated = numerator / sd_x
    
    # Calculate critical value
    t_value = stats.t.ppf(1 - 0.05 / (2 * n), n - 2)
    g_critical = ((n - 1) * np.sqrt(np.square(t_value))) / (np.sqrt(n) * np.sqrt(n - 2 + np.square(t_value)))
    
    print(f"Grubbs Calculated Value: {g_calculated:.4f}")
    print(f"Grubbs Critical Value: {g_critical:.4f}")
    
    if g_calculated > g_critical:
        print("Result: Outlier detected (reject null hypothesis)")
    else:
        print("Result: No outlier detected (accept null hypothesis)")
    print()

# Test with data without outliers
data_no_outliers = np.array([12, 13, 14, 19, 21, 23])
print("Testing data without outliers:")
grubbs_test(data_no_outliers)

# Test with data containing outliers
data_with_outliers = np.array([12, 13, 14, 19, 21, 23, 45])
print("Testing data with outliers:")
grubbs_test(data_with_outliers)
Testing data without outliers:
Grubbs Calculated Value: 1.4275
Grubbs Critical Value: 1.8871
Result: No outlier detected (accept null hypothesis)

Testing data with outliers:
Grubbs Calculated Value: 2.2765
Grubbs Critical Value: 2.0200
Result: Outlier detected (reject null hypothesis)

Comparison of Methods

Method Pros Cons Best For
Built-in Library Easy to use, automatic removal Less control over process Quick outlier detection
Manual Implementation Full control, understand statistics More code required Learning and customization

Conclusion

The Grubbs test is an effective statistical method for detecting outliers in normally distributed datasets. You can use the outliers library for quick implementation or implement the formula manually for better understanding and control over the process.

Updated on: 2026-03-27T06:04:05+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements