Skip to content

heech31/phyloBCG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhyloBCG

R codes to implent PhyloBCG and the quantitative microbiome profiling data (Vandeputte et al., 2017) processed by Yoon, Gaynanova and Muller (2019).

Reference:

Chung, H.C., Gaynanova, I., & Ni, Y. Phylogenetically informed Bayesian truncated copula graphical models for microbial association networks. arXiv

Yoon, G., Gaynanova, I., & Müller, C. L. (2019). Microbial networks in SPRING-Semi-parametric rank-based correlation and partial correlation estimation for quantitative microbiome data. Frontiers in genetics, 10, 516.

Vandeputte, D., Kathagen, G., D’hoe, K., Vieira-Silva, S., Valles-Colomer, M., Sabino, J., … & Raes, J. (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature, 551(7681), 507-511.

To run

To implement, (1) clone this repository; (2) open phyloBCG.Rproj; (3) load packages and function located in the folder ./functions as follows

sourcePath <- "./functions/"

# Load packages and source functions
files.sources <-  list.files(path=sourcePath)
extension <- substring(files.sources,nchar(files.sources)-1,nchar(files.sources))
lets.source <- paste(sourcePath, files.sources[extension==".R"], sep="")
mapply(source, lets.source) 

Simple example

This example assumes that you have your own dataset x and corresponding phylogenetic tree mytree, where mytree should be an object of phylo

load("./QMPdata/QMPtree.RData")
x <- QMP
n <- nrow(x)  # Sample size
p <- ncol(x)  # Number of microbes
mytree <- qmptree
class(mytree)
## [1] "phylo"

Now, set hyperparmeter values. Based on the tree mytree, the function getCovfromTree returns corresponding tree “covariance” matrix.

########################################################################
#####             Set hyperarameters                ####################
########################################################################

## Set tree related hyperparameters
# Get tree covariance matrix of terminal nodes
H <- getCovfromTree(mytree)
isSymmetric(H)
H_t <- cov2cor(H)    # Tree correlation matrix
H_tinv <- solve(H_t) # Tree inverse correlation matrix 
IGsig2 <-  rep(1e-3,2) # Inv-gamma parameters for tree scale parameter (sigma^2)

## Set hyperparameters for concentration matrix sampling
# Note that the parameterization used in the code is slightly different from those in Wang (2014).  )
h <-  2500      # (v0 in code) = (v0 in paper)^2
lambda <-  1    # Rate paramter of exponential prior ( Exp(lambda/2) ) for SSVS 
IGv0   <- rep(1e-3,2)  # Inv-gamma parameters for the spike and slab (v0^2)

## Set latent space dimension
K <- 2 

## Save hyperparameter values as a list object
hyperparameters <- list(h=h, lambda=lambda, IGsig2=IGsig2, IGv0=IGv0, H_t=H_t, H_tinv=H_tinv )

Using the scaled empirical cdf of each variables in x initialize Z.

# Get scaled empirical cdfs and zhat
ecdf.scale <- n/(n+1)
eFx  <- apply(x,2,ecdf)
eFx  <- lapply(eFx, function(x){  function(y)  ecdf.scale *x(y) })
eFxx <- Map(function(f,x) do.call(f, list(x)), eFx, alply(x,2)  )
zhat <- matrix( unlist( lapply(eFxx,function(pr) qnorm( pr ) ) ), n, p)

Initialize remaining parameters.

########################################################################
########   Initial values for gibbs sampling ###########################
########################################################################
R_mc     <- cor(zhat)               # Initial correlation matrix
zhat_mc  <- zhat                    # Initial truncated data. Observed data will be fixed
delta_mc <- qnorm( colSums( x==0 )/n ) # Initial threshold
v0_mc   <- 0.01                     # Initial spike variance
tau_mc  <- h*v0_mc*matrix(1,p,p)    # Matrix representation of spike and slab variances
pijk_mc <- matrix( 2 / (p - 1),p,p) # Initial edge inclusion probability matrix
U_mc <- mvtnorm::rmvnorm(K, rep(0,p), H_t )  # Initial latent positions
sig2_mc <- 1                        # Initial tree scale parameter

This code will run 110 Gibbs iterations, then return a posterior sample of size 100 discarding the first 10 burn-in iterations. The number of iterations is only meant to make this a small example, in practice we recommend using around 10,000 iterations, and checking the convergence diagnostics.

burnin <- 10  # Number of burnin-iteration
nmc    <- 100 # Number of MCMC-sample that will be kept

gibbsSample <- treeGibbs(x, delta_mc, zhat_mc, R_mc, v0_mc, tau_mc, pijk_mc, U_mc, sig2_mc,
                         hyperparameters, burnin, nmc, verbose=FALSE, thin=NULL)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages