R codes to implent PhyloBCG and the quantitative microbiome profiling data (Vandeputte et al., 2017) processed by Yoon, Gaynanova and Muller (2019).
Chung, H.C., Gaynanova, I., & Ni, Y. Phylogenetically informed Bayesian truncated copula graphical models for microbial association networks. arXiv
Yoon, G., Gaynanova, I., & Müller, C. L. (2019). Microbial networks in SPRING-Semi-parametric rank-based correlation and partial correlation estimation for quantitative microbiome data. Frontiers in genetics, 10, 516.
Vandeputte, D., Kathagen, G., D’hoe, K., Vieira-Silva, S., Valles-Colomer, M., Sabino, J., … & Raes, J. (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature, 551(7681), 507-511.
To implement, (1) clone this repository; (2) open phyloBCG.Rproj; (3)
load packages and function located in the folder ./functions as
follows
sourcePath <- "./functions/"
# Load packages and source functions
files.sources <- list.files(path=sourcePath)
extension <- substring(files.sources,nchar(files.sources)-1,nchar(files.sources))
lets.source <- paste(sourcePath, files.sources[extension==".R"], sep="")
mapply(source, lets.source) This example assumes that you have your own dataset x and
corresponding phylogenetic tree mytree, where mytree should be an
object of phylo
load("./QMPdata/QMPtree.RData")
x <- QMP
n <- nrow(x) # Sample size
p <- ncol(x) # Number of microbes
mytree <- qmptree
class(mytree)## [1] "phylo"
Now, set hyperparmeter values. Based on the tree mytree, the function
getCovfromTree returns corresponding tree “covariance” matrix.
########################################################################
##### Set hyperarameters ####################
########################################################################
## Set tree related hyperparameters
# Get tree covariance matrix of terminal nodes
H <- getCovfromTree(mytree)
isSymmetric(H)
H_t <- cov2cor(H) # Tree correlation matrix
H_tinv <- solve(H_t) # Tree inverse correlation matrix
IGsig2 <- rep(1e-3,2) # Inv-gamma parameters for tree scale parameter (sigma^2)
## Set hyperparameters for concentration matrix sampling
# Note that the parameterization used in the code is slightly different from those in Wang (2014). )
h <- 2500 # (v0 in code) = (v0 in paper)^2
lambda <- 1 # Rate paramter of exponential prior ( Exp(lambda/2) ) for SSVS
IGv0 <- rep(1e-3,2) # Inv-gamma parameters for the spike and slab (v0^2)
## Set latent space dimension
K <- 2
## Save hyperparameter values as a list object
hyperparameters <- list(h=h, lambda=lambda, IGsig2=IGsig2, IGv0=IGv0, H_t=H_t, H_tinv=H_tinv )Using the scaled empirical cdf of each variables in x initialize
Z.
# Get scaled empirical cdfs and zhat
ecdf.scale <- n/(n+1)
eFx <- apply(x,2,ecdf)
eFx <- lapply(eFx, function(x){ function(y) ecdf.scale *x(y) })
eFxx <- Map(function(f,x) do.call(f, list(x)), eFx, alply(x,2) )
zhat <- matrix( unlist( lapply(eFxx,function(pr) qnorm( pr ) ) ), n, p)Initialize remaining parameters.
########################################################################
######## Initial values for gibbs sampling ###########################
########################################################################
R_mc <- cor(zhat) # Initial correlation matrix
zhat_mc <- zhat # Initial truncated data. Observed data will be fixed
delta_mc <- qnorm( colSums( x==0 )/n ) # Initial threshold
v0_mc <- 0.01 # Initial spike variance
tau_mc <- h*v0_mc*matrix(1,p,p) # Matrix representation of spike and slab variances
pijk_mc <- matrix( 2 / (p - 1),p,p) # Initial edge inclusion probability matrix
U_mc <- mvtnorm::rmvnorm(K, rep(0,p), H_t ) # Initial latent positions
sig2_mc <- 1 # Initial tree scale parameterThis code will run 110 Gibbs iterations, then return a posterior sample of size 100 discarding the first 10 burn-in iterations. The number of iterations is only meant to make this a small example, in practice we recommend using around 10,000 iterations, and checking the convergence diagnostics.
burnin <- 10 # Number of burnin-iteration
nmc <- 100 # Number of MCMC-sample that will be kept
gibbsSample <- treeGibbs(x, delta_mc, zhat_mc, R_mc, v0_mc, tau_mc, pijk_mc, U_mc, sig2_mc,
hyperparameters, burnin, nmc, verbose=FALSE, thin=NULL)