Exploratory Data Analysis - R Programming FAQs

corrgram Function in R

February 8, 2026January 25, 2026 by Muhammad Imdad Ullah

The corrgram function in R from the corrgram package is a powerful tool for creating correlation matrix visualizations. It combines numerical correlation values with graphical representations to help identify patterns, relationships, and outliers in multivariate data.

corrgram Package Installation

The corrgram first need to be installed. The following commands can be used for the installation and loading of corrgram package.

# Install and load the package
install.packages("corrgram")
library(corrgram)

# Load additional packages for examples
library(datasets)
library(corrplot)  # For comparison

One can also use corrplot package for visualization.

Syntax of the corrgram Function

The general syntax of the corrgram function in the R Language is:

corrgram(x, order = FALSE, panel = panel.shade, lower.panel = panel, 
         upper.panel = panel, diag.panel = NULL, text.panel = textPanel,
         label.pos = c(0.5, 0.5), label.srt = 0, cex.labels = NULL,
         font.labels = 1, row1attop = TRUE, dir = "", gap = 0, abs = FALSE, ...)

corrgram Examples

The following example makes use of the mtcars data set to draw a correlation matrix visualization. A numerical correlation matrix is also produced by using cor_matrix() function.

# Load mtcars dataset
data(mtcars)

# Basic corrgram
corrgram(mtcars, 
         main = "Correlation Matrix of mtcars Dataset",
         cex.main = 1.2)

# Calculate numerical correlations
cor_matrix <- cor(mtcars)
round(cor_matrix, 3)

## OUTPUT
        mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb
mpg   1.000 -0.852 -0.848 -0.776  0.681 -0.868  0.419  0.664  0.600  0.480 -0.551
cyl  -0.852  1.000  0.902  0.832 -0.700  0.782 -0.591 -0.811 -0.523 -0.493  0.527
disp -0.848  0.902  1.000  0.791 -0.710  0.888 -0.434 -0.710 -0.591 -0.556  0.395
hp   -0.776  0.832  0.791  1.000 -0.449  0.659 -0.708 -0.723 -0.243 -0.126  0.750
drat  0.681 -0.700 -0.710 -0.449  1.000 -0.712  0.091  0.440  0.713  0.700 -0.091
wt   -0.868  0.782  0.888  0.659 -0.712  1.000 -0.175 -0.555 -0.692 -0.583  0.428
qsec  0.419 -0.591 -0.434 -0.708  0.091 -0.175  1.000  0.745 -0.230 -0.213 -0.656
vs    0.664 -0.811 -0.710 -0.723  0.440 -0.555  0.745  1.000  0.168  0.206 -0.570
am    0.600 -0.523 -0.591 -0.243  0.713 -0.692 -0.230  0.168  1.000  0.794  0.058
gear  0.480 -0.493 -0.556 -0.126  0.700 -0.583 -0.213  0.206  0.794  1.000  0.274
carb -0.551  0.527  0.395  0.750 -0.091  0.428 -0.656 -0.570  0.058  0.274  1.000

Note that the diagonal shows the variable names. The upper triangle shows the colored squares with correlation coefficients. The lower triangle shows the colored ellipses/pies showing the strength and direction of the correlation.

The visualization and numerical results show the following

Dark blue/positive ellipses indicate strong positive correlations
Red/negative ellipses indicate strong negative correlations
Lighter colors indicate weaker correlations
For example, mpg and wt show a strong negative correlation (-0.87)

Customizing Panel and Ordering

One can easily customize the panel and ordering. For example

# Custom panel functions
corrgram(mtcars, 
         order = TRUE,  # PCA ordering
         lower.panel = panel.pie,  # Pies in lower triangle
         upper.panel = panel.conf, # Confidence intervals in upper
         diag.panel = panel.density, # Density plots on diagonal
         main = "Customized Corrgram")

# Alternative with different panels
corrgram(mtcars,
         lower.panel = panel.shade,
         upper.panel = panel.pts,  # Scatter plots
         diag.panel = panel.minmax, # Min-max values
         cex.labels = 1.2)

PCA ordering groups of highly correlated variables together
Pies show the proportion of correlation (filled portion = |r|)
Shading intensity indicates correlation strength
Scatter plots in the upper triangle show actual data relationships

Best Practices and Interpretation Guidelines

The following are best practices and interpretation guidelines when using corrgram function in R:

Color Interpretation:
- Blue = Positive correlation
- Red = Negative correlation
- Saturation intensity = Strength of correlation
- White = No correlation
Pattern Recognition:
- Blocks of similar colors indicate variable clusters
- Check for multicollinearity (high correlations among predictors)
- Look for unexpected correlations that might indicate data issues
Statistical Considerations:
- Correlation does not imply Causation
- Check assumptions (such as linearity, outliers)
- Consider sample size and p-values
- Use an appropriate correlation method (such as Pearson’s Correlation and Spearman’s Rank Correlation) for your data type
When to Use Different Panels:
- panel.shade: Quick overview of correlation structure
- panel.pie: Emphasize correlation magnitude
- panel.ellipse: Show confidence and data spread
- panel.pts: Identify outliers and nonlinear patterns

Summary of using corrgram Function in R

The corrgram function in R is an excellent tool for exploratory data analysis, providing both visual and numerical insights into correlation structures. The key takeaways:

Start with basic plots and add customization as needed
Always complement visual analysis with numerical correlation values
Consider statistical significance when interpreting patterns
Use appropriate correlation methods for your data type
Combine corrgram() with other EDA tools for comprehensive analysis

The corrgram function in R is particularly valuable in the early stages of data analysis, helping to identify relationships, potential problems, and directions for further investigation.

Computing Z Scores in R

October 17, 2025 by Muhammad Imdad Ullah

Learn how to calculate z scores in R with this step-by-step tutorial. Use R’s powerful functions to standardize your data and analyze its distribution.

Given a distribution with mean $\overline{x}$ and standard deviation $s$, a location-scale transformation known as a Z-score will shift the distribution to have mean 0 and scale the spread to have standard deviation 1:

$$Z= \frac{x – \overline{x}}{s}$$

Computing Z Scores in R

Consider the variable $x$ has a normal distribution with mean 100 and standard deviation 15, that is $x\sim N(100, 15^2)$ and $Z$ has a standard normal distribution, that is $Z\sim N(0, 1)$. One can easily transform the $x$ variable to a Z-score transformation in R and can also visualize it.

Z-Score Transformation in R

# Sample from Normal Distribution
# with mean = 100 and SD = 15

df <- data.frame(x = rnorm(100, mean = 100, sd = 15))
# Z-score Tranformation
df$z <- scale(df$x)

## Descriptive Statistics
summary(df)

Transforming a Variable to Z-Score in R

One can visualize the original variable $x$ and the Z-score variable using a histogram and a density estimation.

##  ggplot for original variable
library(ggplot2)
p1 <- ggplot(df, aes(x = x))

# Histogram with density instead of count on y-axis
p1 <- p1 + geom_histogram(aes(y = ..density..))
p1 <- p1 + geom_density(alpha = .2, fill="yellow")
p1 <- p1 + geom_rug()
p1 <- p1 + labs(title = "X ~ N(100, 15)")
p1

Histogram with density in R for original variable

## ggplot for z variable
p2 <- ggplot(df, aes(x = z))

# Histogram with density instead of count on y-axis
p2 <- p2 + geom_histogram(aes (y=..density..))
p2 <- p2 + geom_density(alpha = 0.2, fill = "yellow")
p2 <- p2 + geom_rug()
p2 <- p2 + labs(title = "Z ~ N(0, 1)")
p2

histogram with density for Z variable in R

One can combine these two graphs using grid.arrange() function from gridExtra R package.

library(gridExtra)
grid.arrange(grobs = list(p1, p2), ncol = 2)

Z scores in R, Z-score transformation in R

Calculating Z Scores in R Manually

For manual calculation and full control or educational purposes, one can calculate the z scores in R by using basic arithmetic functions: mean() and sd(). The formula is:$$z=\frac{x-\mu }{\sigma }$$ where: $x$ is a data point $\mu$ is the mean of the data $\sigma$ is the standard deviation of the data

Example: Z-score for a data frame column R# Create a sample data frame

df <- data.frame(
  pressure = c(98, 102, 100, 99, 101),
  temperature = c(20, 22, 23, 21, 25)
)

# Calculate z-scores for the 'pressure' column manually
mean_pressure <- mean(df$pressure)
sd_pressure <- sd(df$pressure)
df$pressure_z <- (df$pressure - mean_pressure) / sd_pressure

# Print the data frame with the new z-score column
print(df)

## Output
  pressure temperature pressure_z
1       98          20 -1.2649111
2      102          22  1.2649111
3      100          23  0.0000000
4       99          21 -0.6324555
5      101          25  0.6324555

Learn more about Z-Scores

R Language: A Quick Reference Guide – IV

September 5, 2024October 17, 2023 by Muhammad Imdad Ullah

R Quick Reference Guide

R language: A Quick Reference Guide about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This Quick Reference is classified into different groups. Let us start with R Language: A Quick Reference – IV.

This Quick Reference will help in performing different descriptive statistics on vectors, matrices, lists, data frames, arrays, and factors.

Basic Descriptive Statistics in R Language

The following is the list of widely used functions that are further helpful in computing descriptive statistics. The functions below are not direct descriptive statistics functions, however, these functions are helpful to compute other descriptive statistics.

R Command	Short Description
sum(x1, x2, … , xn)	Computes the sum/total of $n$ numeric values given as argument
prod(x1, x2, … , xn)	Computes the product of all $n$ numeric values given as argument
min(x1, x2, … , xn)	Gives smallest of all $n$ values given as argument
max(x1, x2, …, xn)	Gives largest of all $n$ values given as argument
range(x1, x2, … , xn)	Gives both the smallest and largest of all $n$ values given as argument
pmin(x1, x2, …)	Returns minima of the input values
pmax(x1, x2, …)	Returns maxima of the input values

Statistical Descriptive Statistics in R Language

The following functions are used to compute measures of central tendency, measures of dispersion, and measures of positions.

R Command	Short Description
mean(x)	Computes the arithmetic mean of all elements in $x$
sd(x)	Computes the standard deviation of all elements in $x$
var(x)	Computes the variance of all elements in $x$
median(x)	Computes the median of all elements in $x$
quantile(x)	Computes the median, quartiles, and extremes in $x$
quantile(x, p)	Computes the quantiles specified by $p$

Cumulative Summaries in R Language

The following functions are also helpful in computing the other descriptive calculations.

R Command	Short Description
cumsum(x)	Computes the cumulative sum of $x$
cumprod(x)	Computes the cumulative product of $x$
cummin(x)	Computes the cumulative minimum of $x$
cummax(x)	Computes the cumulative maximum of $x$

Sorting and Ordering Elements in R Language

The sorting and ordering functions are useful in especially non-parametric methods.

R Command	Short Description
sort(x)	Sort the all elements of $x$ in ascending order
sort(x, decreasing = TRUE)	Sor the all elements of $x$ in descending order
rev(x)	Reverse the elements in $x$
order(x)	Get the ordering permutation of $x$

Sequence and Repetition of Elements in R Language

These functions are used to generate a sequence of numbers or repeat the set of numbers $n$ times.

R Command	Short Description
a:b	Generates a sequence of numbers from $a$ to $b$ in steps of size 1
seq(n)	Generates a sequence of numbers from 1 to $n$
seq(a, b)	Generates a sequence of numbers from $a$ to $b$ in steps of size 1, it is the same as a:b
seq(a, b, by=s)	Generates a sequence of numbers from $a$ to $b$ in steps of size $s$.
seq(a, b, length=n)	Generates a sequence of numbers having length $n$ from $a$ to $b$
rep(x, n)	Repeats the elements $n$ times
rep(x, each=n)	Repeats the elements of $x$, each element is repeated $n$ times

R Quick Reference Guide Frequently Asked Questions About R

R Language: A Quick Reference – I

https://gmstat.com

corrgram Function in R

Table of Contents

corrgram Package Installation

Syntax of the corrgram Function

corrgram Examples

Customizing Panel and Ordering

Best Practices and Interpretation Guidelines

Summary of using corrgram Function in R

Computing Z Scores in R

Table of Contents

Computing Z Scores in R

Z-Score Transformation in R

Transforming a Variable to Z-Score in R

Calculating Z Scores in R Manually

R Language: A Quick Reference Guide – IV

R Quick Reference Guide

Table of Contents

Basic Descriptive Statistics in R Language

Statistical Descriptive Statistics in R Language

Cumulative Summaries in R Language

Sorting and Ordering Elements in R Language

Sequence and Repetition of Elements in R Language