corrgram Function in R

The corrgram function in R from the corrgram package is a powerful tool for creating correlation matrix visualizations. It combines numerical correlation values with graphical representations to help identify patterns, relationships, and outliers in multivariate data.

corrgram Package Installation

The corrgram first need to be installed. The following commands can be used for the installation and loading of corrgram package.

# Install and load the package
install.packages("corrgram")
library(corrgram)

# Load additional packages for examples
library(datasets)
library(corrplot)  # For comparison

One can also use corrplot package for visualization.

Syntax of the corrgram Function

The general syntax of the corrgram function in the R Language is:

corrgram(x, order = FALSE, panel = panel.shade, lower.panel = panel, 
         upper.panel = panel, diag.panel = NULL, text.panel = textPanel,
         label.pos = c(0.5, 0.5), label.srt = 0, cex.labels = NULL,
         font.labels = 1, row1attop = TRUE, dir = "", gap = 0, abs = FALSE, ...)

corrgram Examples

The following example makes use of the mtcars data set to draw a correlation matrix visualization. A numerical correlation matrix is also produced by using cor_matrix() function.

# Load mtcars dataset
data(mtcars)

# Basic corrgram
corrgram(mtcars, 
         main = "Correlation Matrix of mtcars Dataset",
         cex.main = 1.2)

# Calculate numerical correlations
cor_matrix <- cor(mtcars)
round(cor_matrix, 3)

## OUTPUT
        mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb
mpg   1.000 -0.852 -0.848 -0.776  0.681 -0.868  0.419  0.664  0.600  0.480 -0.551
cyl  -0.852  1.000  0.902  0.832 -0.700  0.782 -0.591 -0.811 -0.523 -0.493  0.527
disp -0.848  0.902  1.000  0.791 -0.710  0.888 -0.434 -0.710 -0.591 -0.556  0.395
hp   -0.776  0.832  0.791  1.000 -0.449  0.659 -0.708 -0.723 -0.243 -0.126  0.750
drat  0.681 -0.700 -0.710 -0.449  1.000 -0.712  0.091  0.440  0.713  0.700 -0.091
wt   -0.868  0.782  0.888  0.659 -0.712  1.000 -0.175 -0.555 -0.692 -0.583  0.428
qsec  0.419 -0.591 -0.434 -0.708  0.091 -0.175  1.000  0.745 -0.230 -0.213 -0.656
vs    0.664 -0.811 -0.710 -0.723  0.440 -0.555  0.745  1.000  0.168  0.206 -0.570
am    0.600 -0.523 -0.591 -0.243  0.713 -0.692 -0.230  0.168  1.000  0.794  0.058
gear  0.480 -0.493 -0.556 -0.126  0.700 -0.583 -0.213  0.206  0.794  1.000  0.274
carb -0.551  0.527  0.395  0.750 -0.091  0.428 -0.656 -0.570  0.058  0.274  1.000

Note that the diagonal shows the variable names. The upper triangle shows the colored squares with correlation coefficients. The lower triangle shows the colored ellipses/pies showing the strength and direction of the correlation.

corrgram function in R

The visualization and numerical results show the following

  • Dark blue/positive ellipses indicate strong positive correlations
  • Red/negative ellipses indicate strong negative correlations
  • Lighter colors indicate weaker correlations
  • For example, mpg and wt show a strong negative correlation (-0.87)

Customizing Panel and Ordering

One can easily customize the panel and ordering. For example

# Custom panel functions
corrgram(mtcars, 
         order = TRUE,  # PCA ordering
         lower.panel = panel.pie,  # Pies in lower triangle
         upper.panel = panel.conf, # Confidence intervals in upper
         diag.panel = panel.density, # Density plots on diagonal
         main = "Customized Corrgram")

# Alternative with different panels
corrgram(mtcars,
         lower.panel = panel.shade,
         upper.panel = panel.pts,  # Scatter plots
         diag.panel = panel.minmax, # Min-max values
         cex.labels = 1.2)
  • PCA ordering groups of highly correlated variables together
  • Pies show the proportion of correlation (filled portion = |r|)
  • Shading intensity indicates correlation strength
  • Scatter plots in the upper triangle show actual data relationships

Best Practices and Interpretation Guidelines

The following are best practices and interpretation guidelines when using corrgram function in R:

  1. Color Interpretation:
    • Blue = Positive correlation
    • Red = Negative correlation
    • Saturation intensity = Strength of correlation
    • White = No correlation
  2. Pattern Recognition:
    • Blocks of similar colors indicate variable clusters
    • Check for multicollinearity (high correlations among predictors)
    • Look for unexpected correlations that might indicate data issues
  3. Statistical Considerations:
  4. When to Use Different Panels:
    • panel.shade: Quick overview of correlation structure
    • panel.pie: Emphasize correlation magnitude
    • panel.ellipse: Show confidence and data spread
    • panel.pts: Identify outliers and nonlinear patterns

Summary of using corrgram Function in R

The corrgram function in R is an excellent tool for exploratory data analysis, providing both visual and numerical insights into correlation structures. The key takeaways:

  1. Start with basic plots and add customization as needed
  2. Always complement visual analysis with numerical correlation values
  3. Consider statistical significance when interpreting patterns
  4. Use appropriate correlation methods for your data type
  5. Combine corrgram() with other EDA tools for comprehensive analysis

The corrgram function in R is particularly valuable in the early stages of data analysis, helping to identify relationships, potential problems, and directions for further investigation.

Leave a Comment