The average, also known as the arithmetic mean, is a simple yet powerful statistical concept. In R, there are several functions to calculate the average or mean value for numeric data sets.

In this comprehensive guide, we will explore the different types of average, functions to find the mean, median and mode in R, and examples with sample code to master their usage.

Why Calculate Averages?

Finding the averages allows us to:

  • Identify the central tendency in a data set
  • Reduce large data sets to singular representative values
  • Smooth out anomalies and outliers
  • Compare different data samples
  • Determine growth rates over time
  • Reveal patterns and correlations

By determining the mean, median or mode, we can characterize the distribution, spread, skewness and other properties. This allows simpler statistical analysis.

Types of Averages

There are 3 main types of averages used in statistics:

Mean

The arithmetic mean or average is calculated by adding all values in a set and dividing the sum by the total count of values. It is the most commonly used average.

Median

The median is the middle value separating the higher half and lower half of the data set.

Mode

The mode is the value that appears most frequently in a data set. A set may have one mode, more than one mode, or no mode.

Calculating the Mean in R

R provides inbuilt functions to calculate the mean or average.

The key functions are:

  • mean() – Calculate mean of a data set
  • median() – Find median value
  • naiveBayes() – Modal value

mean()

The mean() function can be used with vectors, matrices and data frames.

v <- c(2, 4, 6, 9) 

mean(v)

5.25

By default, the mean() function will skip NA values. To include NA values as 0, set na.rm = FALSE.

v <- c(2, 4, NA, 8)

mean(v) 
# NA 

mean(v, na.rm = FALSE)
# 3.5

For data frames, we can take means by rows or columns:

df <- data.frame(
  v1 = c(1, 3, 5),
  v2 = c(2, 4, 6)  
)

mean(df$v1) # Mean of column v1  

apply(df, 1, mean) # Row-wise means

median()

The median can be calculated using the median() function.

v <- c(1, 2, 3, 4, 5)  

median(v)
#  3

Like the mean() function, set na.rm = FALSE to include NA values.

Finding the Mode

There is no built-in function for the mode in base R. However, we can create custom functions to find modal values.

Here is one way to calculate the mode by leveraging the table() function:

getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(table(v))] 
}

v <- c(5, 7, 3, 7, 5, 3, 5, 9)  

getmode(v) # 5

The approach:

  1. Get unique values with unique()
  2. Build frequency table with table()
  3. Extract value where the table returns max frequency with which.max()

So by just 3 lines we can compute the modal value!

Comparing the Types of Averages

The mean, median and mode can yield different results, even on the same data set.

For example:

scores <- c(7, 8, 9, 10, 13)  

mean(scores) # 9.4
median(scores) # 9  
getmode(scores) # No mode 

This asymmetry helps reveal insights:

  • Mean – Sensitive to extreme scores due to equal weighting
  • Median – Less affected by outliers compared to mean
  • Mode – Ignores magnitude of values, only frequency

By using them together we can better describe central tendency, spread and shape characteristics.

Examples by Data Type

Now let us run through some examples of finding averages on different data structures in R.

Numeric Vectors

ages <- c(23, 65, 43, 21, 56)   

mean(ages)  
# 41.6

median(ages)
# 43   

getmode(ages) # No mode

Matrices

We can take row-wise or column-wise means:

m <- matrix(c(1:6), nrow = 2, ncol = 3) 

# Column Means
apply(m, 2, mean)  

# Row Means 
apply(m, 1, mean)  

For medians and modes, we need to flatten to a vector first:

median(as.numeric(m)) 
getmode(as.numeric(m))

Data Frames

Similar concept for data frames, where we target columns or rows:

df <- data.frame(
  v1 = c(1, NA, 3),
  v2 = c(3, 5, 3)  
)  

mean(df$v1, na.rm = TRUE) # Mean of v1
apply(df, 1, mean) # Row-wise means 

median(df$v2) # Median of v2  

And flatten to find overall median or mode:

median(as.numeric(df))
getmode(as.numeric(df)) 

This covers the core ways to calculate averages on R data sets. Let‘s now discuss some practical examples.

Real-World Examples

Estimating averages allow us to benchmark performance and inform decisions using R analytics.

Here are some business use cases:

Sales Averages

By taking means and medians of sales data, we can identify growth trends:

sales <- c(350, 390, 330, 400, 420)
months <- c("Jan", "Feb", "Mar", "Apr", "May")  

data <- data.frame(months, sales)

mean(data$sales) # Average monthly sales  
median(data$sales) # Middle value  

Time series charts also visualize patterns:

library(ggplot2)

ggplot(data) + 
  geom_line(aes(months, sales))

Customer Analytics

We can segment customers by order value brackets:

orders <- c(100, 600, 200, 550, 150, 850) 

# Group into buckets  
cut(
  orders,
  breaks = c(0, 200, 500, Inf)  
) 

# Get statistics for groups   
by_group <- tapply(orders, cut(orders), mean)  

barplot(by_group)

Monitoring Web Traffic

For web analytics, averages help monitor traffic volumes. By comparing weekly site visits:

visits <- c(
  mon = 983, tues = 1032, wed = 967, 
  thurs = 890, fri= 1200, sat = 1100, sun = 690
)

mean(visits) # Overall weekly average  

# Day-wise
aggregate(visits ~ names(visits), FUN = mean)

# Median visits 
median(visits) 

We can spot underperforming days to shift marketing budgets.

The same applies for page views, conversion rates and other metrics to drive decisions.

Advanced Usage

We‘ve covered the basics of using averages in R. There are also more advanced functions that add extra capabilities:

Weighted Averages

Weighted averages account for different weights per data point. Useful for aggregated figures:

values <- c(500, 200, 800, 100)
weights <- c(0.6, 0.1, 0.2, 0.1)  

weighted.mean(values, weights) 

# Alternate approach  
sum(values * weights) / sum(weights)

Rolling Averages

Calculate averages over a rolling window of observations. Helps smooth time series:

library(zoo)  

rollapply(
  data = visits,
  width = 3,
  FUN = mean
) 

Trimmed Means

Remove top and bottom percentiles before taking average to exclude outliers:

y <- c(1, 2, 10, 101, 150)

mean(y) # 52.8  
trimmed <- mean(y, trim = 0.2) # 2.5 - excludes top/bottom 20% of y

This covers some common special cases when working with business data in the real world. The same concepts apply for medians and modes too.

Conclusion

Calculating averages is an essential skill for data analysis in R. Using the built-in mean(), median() and custom mode() functions allows us to characterize the central tendency in data sets.

By finding the averages, we can:

  • Identify representative values
  • Smooth anomalies and reveal patterns
  • Segment samples into brackets
  • Monitor trends over time
  • Inform decisions using analytics

The examples and use cases provided demonstrate practical applications across sales, marketing, web analytics and business intelligence.

Combining the arithmetic mean, geometric median and modal values yields additional insights into distribution shapes – helping make smarter data-driven decisions.

Similar Posts