
Linear regression is a fundamental tried and true statistical method. Used to model the relationship between a dependent variable and one or more independent variables, linear regression helps in predicting outcomes and understanding relationships within data. This helps make linear regression an invaluable analytical tool in a vast array of domains, from from economics to healthcare.
The statistics.linear_regression() function of the statistics module simplifies the process of fitting a linear model to two-dimensional data. This allows for a simple and readily available method for leveraging this technique without the use of external libraries.
Let’s take a look at how to use the Python statistics module’s linear_regression() function.
Using the statistics.linear_regression() Function
To use the linear_regression() function, you first need to import the statistics module. It is also common to use libraries like numpy and/or pandas to handle and manipulate data. We’ll use numpy in our examples.
import numpy as np from statistics import linear_regression
The linear_regression() function requires input data in the form of two lists or arrays representing the x and y coordinates of the data points. The data should be numeric and must have the same length. Let’s create a sample dataset to use.
x = np.array([1, 2, 3, 4, 5]) y = np.array([3, 5, 7, 9, 11])
In this example, x represents the independent variable, while y is the dependent variable.
Once we have our data, you can call the linear_regression() function, which takes the x and y values as parameters. The function returns two values:
- the slope of the regression line
- the intercept of the regression line
Here is how you can use the function:
import statistics
import numpy as np
x = np.array([1, 2, 3, 4, 5])
y = np.array([3, 6, 8, 9, 12])
slope, intercept = statistics.linear_regression(x, y)
print(f"Slope: {slope}, Intercept: {intercept}")
# Output: Slope: 2.1, Intercept: 1.299999999999999
Once known, the slope and intercept of the fitted line can be used to make predictions or to understand the relationship between variables. As you no doubt are aware:
- The slope indicates the change in the dependent variable (y) for a one-unit increase in the independent variable (x)
- A positive slope suggests a direct relationship, while a negative slope indicates an inverse relationship
- The intercept represents the predicted value of y when x is zero
- In practical terms, it may not always have a meaningful interpretation, especially if x=0 is outside the range of observed data
- The linear regression line essentially provides a summary of the data’s trend
In visualizing the regression results, one can enhance understanding and communication of findings. The matplotlib library, a commonly used Python plotting library, can help in this case. Below is an example of how to use Matplotlib to visualize the original data and the fitted regression line.
import matplotlib.pyplot as plt
# Plotting the data points
plt.scatter(x, y, color='blue', label='Data points')
# Plotting the regression line
plt.plot(x, slope * x + intercept, color='red', label='Regression line')
# Adding labels and title
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Linear Regression Example')
plt.legend()
# Display the plot
plt.show()

Linear regression line visualization using Matplotlib
This code snippet creates a scatter plot of the data points and overlays the regression line, providing a clear visual representation of the relationship between x and y.
Wrapping Up
In summary, the statistics.linear_regression() function in Python provides an accessible way to perform linear regression analysis.
For more information, check out the official documentation for the Python statistics modules, as well as the numpy documentation.
