Publish AI, ML & data-science insights to a global community of data professionals.

Estimating the number of dimensions with Exploratory Graph Analysis

A summary and tutorial of a powerful technique from the network psychometrics literature

Exploratory Graph Analysis. Image by author
Exploratory Graph Analysis. Image by author

In psychology, education, and behavioral sciences we use scales/instruments to measure a given construct (e.g., Anxiety; Happiness). For that, we usually have a questionnaire with an X number of items and wish to know the number of latent factors that arise from these items. This is usually made with Exploratory Factor Analysis (EFA), where the number of dimensions is usually estimated by examining the patterns of eigenvalues (see my guide on eigenvalues here). Two of the most common methods that use eigenvalues are Kaiser-Guttman eigenvalue greater than one rule and parallel analysis. However, a lot of critiques have been made with these methods’ performance on estimating dimensionality.

Because of those limitations, Golino & Epskamp (2017) proposed a new method for estimating the dimensionality of a scale, called Exploratory Graph Analysis (EGA). This article will be a brief summary of recent developments on EGA, aiming to the dissemination of this method.

Exploratory Graph Analysis

Network psychometrics methods have gained recent attention in the psychological sciences literature. This may be due to the shift in theoretical interpretation of the correlations observed in data. Traditionally, as done by EFA, psychometric models assume that latent causes explain the observed behavior (i.e., items). Emerging areas such as network psychometrics have promising models for psychology research because it supports theoretical perspectives on complexity, i.e., it considers psychological attributes as systems of observed behaviors that dynamically and mutually reinforce one another.

There’s a relationship between a typical latent variable in the traditional psychometric and in network clusters. As said by Golino & Epskamp (2017):

It can directly be seen that if a latent variable model is the true underlying causal model, we would expect indicators in a network model to feature strongly connected clusters for each latent variable. Since edges correspond to partial correlation coefficients between two variables after conditioning on all other variables in the network, and two indicators cannot become independent after conditioning on observed variables given that they are both caused by a latent variable, the edge strength between two indicators should not be zero.

EGA is an exploratory method that does not rely on a priori assumptions, thus, they do not require any direction from the researcher. In EGA, nodes represent variables (i.e., items) and edges represent the relation (i.e., correlations) between two nodes.

In the authors’ first publication, the EGA is done as follows:

  1. It is estimated the correlation matrix of the observable variables.
  2. graphical least absolute shrinkage and selection operator (glasso) estimation is used to obtain the sparse inverse covariance matrix, with the regularization parameter defined via EBIC over 100 different values.
  3. The walktrap algorithm is used to find the number of clusters of the partial correlation matrix computed in the previous step.
  4. The number of clusters identified equals the number of latent factors in a given dataset.

Today, it is possible to estimate unidimensionality and multidimensionality; we are able to substitute glasso with Triangulated Maximally Filtered Graph (TMFG); and use an algorithm other than walktrap (i.e., louvain).

Golino et al. (2020) showed that the EGA method performs as well as the best factor-analytic techniques. Being that EGA(TMFG) performed moderately good accuracy in both unidimensional and multidimensional structures, and EGA was one of the methods with higher accuracy in general.

Estimating Stability Item and Factor Stability

When estimating an EGA, one thing is important to consider. First, the number of dimensions that were identified in one study may vary on other studies with different samples and sample sizes. In addition, some items might be clustered in dimension A in one study and in dimension B in another. Because of that, Christensen & Golino (2019) created Bootstrap EGA.

R Tutorial

Installing EGAnet

To do the analysis, we will use the R package EGAnet (CRAN; GitHub), made by Golino and Christensen.

First, we have to install the package (we will use the GitHub version since it’s up to date).

library(devtools)
devtools::install_github('hfgolino/EGAnet')

This will install the package on your device. We then will load the package with library(EGAnet) .

Reading data

We will be using Humor Styles Questionnaire data from the Open-Source Psychometrics Project. I added it to my GitHub so you can download it without much effort.

data <- read.delim("https://raw.githubusercontent.com/rafavsbastos/data/main/HSQ.txt"

EGA

We will run the EGA with the following code:

ega.HSQ <- EGA(
               data,
               uni.method = "LE",
               corr = "cor_auto",
               model = "glasso",
               algorithm = "walktrap",
               plot.EGA = TRUE,
               plot.type = "qgraph"
               )

Where the first argument is the dataset (i.e., our items); the second represents what unidimensionality method should be used; the third is the type of the correlation matrix to compute; the fourth indicates the method to use; the fifth is the algorithm we used; the sixth if we want to plot the EGA; and the seventh what type of plot we wish.

We have the following output:

Where we can see a clear "extraction" of 4 dimensions, as expected.

Dimension and Item Stability

To calculate dimension stability, we will run the following code:

bootdata <- bootEGA(
                    data,
                    iter= 1000, 
                    type = "resampling",
                    )

Where the arguments are:

  1. the dataset.
  2. an integer with the number of replica samples to generate from the bootstrap analysis.
  3. Parametric or non-parametric approach.

Take care, since we are using some default arguments, we are not specifying it here. Look at the documentation of the package before doing your own analysis!

Now we will see some useful information typing :

bootdata$summary.table

Where the output is:

We can see the median number of dimensions (median.dim), the standard error (SE.dim), the confidence interval of the number of dimensions (CI.dim), the lower CI (Lower.CI) and upper Ci (Upper. CI), and the lower quantile of the number of dimensions (Lower.Quantile) and upper (Upper.Quantile). Based on this output, it is clear that the model with 4 dimensions is precise (SE = 0.27) and that 4 dimensions is most likely the structure of the scale (given the CI 3.47, 4.53).

Now type bootdata$frequency. It gives you the following output:

Where we can see that 4 factors were replicated 924 times, while 5 factors only 75 times and 6 factors one time.

Now for item stability type: ic.HSQ <- itemStability(bootdata) . The output is an image:

Where we can see that items replicated between 80% and 100% of the time in their given dimension.

Concluding Remarks

As shown by some of the analyses made so far and by recent papers, EGA can show us an accurate way of assessing the dimensionality of instruments that measures psychological attributes. In addition, the authors have implemented a bunch of functions in EGAnet that gives us useful information about dimensions and items. The package is still being updated, with results loading faster, and (probably) new functions are being made.

Contact

Feel free to contact me by:

LinkedIn e-mail: [email protected] Website for consulting and partnerships

References

H. F. Golino, and S. Epskamp, Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research, 2017, PloS one, 12(6), e0174035.

H. F. Golino, D. Shi, A. P. Christensen, L. E. Garrido, M. D. Nieto, R. Sadana, … and A. Martinez-Molina, Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial, 2020, Psychological Methods.

A. P. Christensen and H. F. Golino, Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial, 2019.


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles