The average, also known as the arithmetic mean, is a simple yet powerful statistical concept. In R, there are several functions to calculate the average or mean value for numeric data sets.
In this comprehensive guide, we will explore the different types of average, functions to find the mean, median and mode in R, and examples with sample code to master their usage.
Why Calculate Averages?
Finding the averages allows us to:
- Identify the central tendency in a data set
- Reduce large data sets to singular representative values
- Smooth out anomalies and outliers
- Compare different data samples
- Determine growth rates over time
- Reveal patterns and correlations
By determining the mean, median or mode, we can characterize the distribution, spread, skewness and other properties. This allows simpler statistical analysis.
Types of Averages
There are 3 main types of averages used in statistics:
Mean
The arithmetic mean or average is calculated by adding all values in a set and dividing the sum by the total count of values. It is the most commonly used average.
Median
The median is the middle value separating the higher half and lower half of the data set.
Mode
The mode is the value that appears most frequently in a data set. A set may have one mode, more than one mode, or no mode.
Calculating the Mean in R
R provides inbuilt functions to calculate the mean or average.
The key functions are:
mean()– Calculate mean of a data setmedian()– Find median valuenaiveBayes()– Modal value
mean()
The mean() function can be used with vectors, matrices and data frames.
v <- c(2, 4, 6, 9)
mean(v)
5.25
By default, the mean() function will skip NA values. To include NA values as 0, set na.rm = FALSE.
v <- c(2, 4, NA, 8)
mean(v)
# NA
mean(v, na.rm = FALSE)
# 3.5
For data frames, we can take means by rows or columns:
df <- data.frame(
v1 = c(1, 3, 5),
v2 = c(2, 4, 6)
)
mean(df$v1) # Mean of column v1
apply(df, 1, mean) # Row-wise means
median()
The median can be calculated using the median() function.
v <- c(1, 2, 3, 4, 5)
median(v)
# 3
Like the mean() function, set na.rm = FALSE to include NA values.
Finding the Mode
There is no built-in function for the mode in base R. However, we can create custom functions to find modal values.
Here is one way to calculate the mode by leveraging the table() function:
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(table(v))]
}
v <- c(5, 7, 3, 7, 5, 3, 5, 9)
getmode(v) # 5
The approach:
- Get unique values with
unique() - Build frequency table with
table() - Extract value where the table returns max frequency with
which.max()
So by just 3 lines we can compute the modal value!
Comparing the Types of Averages
The mean, median and mode can yield different results, even on the same data set.
For example:
scores <- c(7, 8, 9, 10, 13)
mean(scores) # 9.4
median(scores) # 9
getmode(scores) # No mode
This asymmetry helps reveal insights:
- Mean – Sensitive to extreme scores due to equal weighting
- Median – Less affected by outliers compared to mean
- Mode – Ignores magnitude of values, only frequency
By using them together we can better describe central tendency, spread and shape characteristics.
Examples by Data Type
Now let us run through some examples of finding averages on different data structures in R.
Numeric Vectors
ages <- c(23, 65, 43, 21, 56)
mean(ages)
# 41.6
median(ages)
# 43
getmode(ages) # No mode
Matrices
We can take row-wise or column-wise means:
m <- matrix(c(1:6), nrow = 2, ncol = 3)
# Column Means
apply(m, 2, mean)
# Row Means
apply(m, 1, mean)
For medians and modes, we need to flatten to a vector first:
median(as.numeric(m))
getmode(as.numeric(m))
Data Frames
Similar concept for data frames, where we target columns or rows:
df <- data.frame(
v1 = c(1, NA, 3),
v2 = c(3, 5, 3)
)
mean(df$v1, na.rm = TRUE) # Mean of v1
apply(df, 1, mean) # Row-wise means
median(df$v2) # Median of v2
And flatten to find overall median or mode:
median(as.numeric(df))
getmode(as.numeric(df))
This covers the core ways to calculate averages on R data sets. Let‘s now discuss some practical examples.
Real-World Examples
Estimating averages allow us to benchmark performance and inform decisions using R analytics.
Here are some business use cases:
Sales Averages
By taking means and medians of sales data, we can identify growth trends:
sales <- c(350, 390, 330, 400, 420)
months <- c("Jan", "Feb", "Mar", "Apr", "May")
data <- data.frame(months, sales)
mean(data$sales) # Average monthly sales
median(data$sales) # Middle value
Time series charts also visualize patterns:
library(ggplot2)
ggplot(data) +
geom_line(aes(months, sales))
Customer Analytics
We can segment customers by order value brackets:
orders <- c(100, 600, 200, 550, 150, 850)
# Group into buckets
cut(
orders,
breaks = c(0, 200, 500, Inf)
)
# Get statistics for groups
by_group <- tapply(orders, cut(orders), mean)
barplot(by_group)
Monitoring Web Traffic
For web analytics, averages help monitor traffic volumes. By comparing weekly site visits:
visits <- c(
mon = 983, tues = 1032, wed = 967,
thurs = 890, fri= 1200, sat = 1100, sun = 690
)
mean(visits) # Overall weekly average
# Day-wise
aggregate(visits ~ names(visits), FUN = mean)
# Median visits
median(visits)
We can spot underperforming days to shift marketing budgets.
The same applies for page views, conversion rates and other metrics to drive decisions.
Advanced Usage
We‘ve covered the basics of using averages in R. There are also more advanced functions that add extra capabilities:
Weighted Averages
Weighted averages account for different weights per data point. Useful for aggregated figures:
values <- c(500, 200, 800, 100)
weights <- c(0.6, 0.1, 0.2, 0.1)
weighted.mean(values, weights)
# Alternate approach
sum(values * weights) / sum(weights)
Rolling Averages
Calculate averages over a rolling window of observations. Helps smooth time series:
library(zoo)
rollapply(
data = visits,
width = 3,
FUN = mean
)
Trimmed Means
Remove top and bottom percentiles before taking average to exclude outliers:
y <- c(1, 2, 10, 101, 150)
mean(y) # 52.8
trimmed <- mean(y, trim = 0.2) # 2.5 - excludes top/bottom 20% of y
This covers some common special cases when working with business data in the real world. The same concepts apply for medians and modes too.
Conclusion
Calculating averages is an essential skill for data analysis in R. Using the built-in mean(), median() and custom mode() functions allows us to characterize the central tendency in data sets.
By finding the averages, we can:
- Identify representative values
- Smooth anomalies and reveal patterns
- Segment samples into brackets
- Monitor trends over time
- Inform decisions using analytics
The examples and use cases provided demonstrate practical applications across sales, marketing, web analytics and business intelligence.
Combining the arithmetic mean, geometric median and modal values yields additional insights into distribution shapes – helping make smarter data-driven decisions.


