PhyloBCG

R codes to implent PhyloBCG and the quantitative microbiome profiling data (Vandeputte et al., 2017) processed by Yoon, Gaynanova and Muller (2019).

Reference:

Chung, H.C., Gaynanova, I., & Ni, Y. Phylogenetically informed Bayesian truncated copula graphical models for microbial association networks. arXiv

Yoon, G., Gaynanova, I., & Müller, C. L. (2019). Microbial networks in SPRING-Semi-parametric rank-based correlation and partial correlation estimation for quantitative microbiome data. Frontiers in genetics, 10, 516.

Vandeputte, D., Kathagen, G., D’hoe, K., Vieira-Silva, S., Valles-Colomer, M., Sabino, J., … & Raes, J. (2017). Quantitative microbiome profiling links gut community variation to microbial load. Nature, 551(7681), 507-511.

To run

To implement, (1) clone this repository; (2) open phyloBCG.Rproj; (3) load packages and function located in the folder ./functions as follows

sourcePath <- "./functions/"

# Load packages and source functions
files.sources <-  list.files(path=sourcePath)
extension <- substring(files.sources,nchar(files.sources)-1,nchar(files.sources))
lets.source <- paste(sourcePath, files.sources[extension==".R"], sep="")
mapply(source, lets.source)

Simple example

This example assumes that you have your own dataset x and corresponding phylogenetic tree mytree, where mytree should be an object of phylo

load("./QMPdata/QMPtree.RData")
x <- QMP
n <- nrow(x)  # Sample size
p <- ncol(x)  # Number of microbes
mytree <- qmptree
class(mytree)

## [1] "phylo"

Now, set hyperparmeter values. Based on the tree mytree, the function getCovfromTree returns corresponding tree “covariance” matrix.

########################################################################
#####             Set hyperarameters                ####################
########################################################################

## Set tree related hyperparameters
# Get tree covariance matrix of terminal nodes
H <- getCovfromTree(mytree)
isSymmetric(H)
H_t <- cov2cor(H)    # Tree correlation matrix
H_tinv <- solve(H_t) # Tree inverse correlation matrix 
IGsig2 <-  rep(1e-3,2) # Inv-gamma parameters for tree scale parameter (sigma^2)

## Set hyperparameters for concentration matrix sampling
# Note that the parameterization used in the code is slightly different from those in Wang (2014).  )
h <-  2500      # (v0 in code) = (v0 in paper)^2
lambda <-  1    # Rate paramter of exponential prior ( Exp(lambda/2) ) for SSVS 
IGv0   <- rep(1e-3,2)  # Inv-gamma parameters for the spike and slab (v0^2)

## Set latent space dimension
K <- 2 

## Save hyperparameter values as a list object
hyperparameters <- list(h=h, lambda=lambda, IGsig2=IGsig2, IGv0=IGv0, H_t=H_t, H_tinv=H_tinv )

Using the scaled empirical cdf of each variables in x initialize Z.

# Get scaled empirical cdfs and zhat
ecdf.scale <- n/(n+1)
eFx  <- apply(x,2,ecdf)
eFx  <- lapply(eFx, function(x){  function(y)  ecdf.scale *x(y) })
eFxx <- Map(function(f,x) do.call(f, list(x)), eFx, alply(x,2)  )
zhat <- matrix( unlist( lapply(eFxx,function(pr) qnorm( pr ) ) ), n, p)

Initialize remaining parameters.

########################################################################
########   Initial values for gibbs sampling ###########################
########################################################################
R_mc     <- cor(zhat)               # Initial correlation matrix
zhat_mc  <- zhat                    # Initial truncated data. Observed data will be fixed
delta_mc <- qnorm( colSums( x==0 )/n ) # Initial threshold
v0_mc   <- 0.01                     # Initial spike variance
tau_mc  <- h*v0_mc*matrix(1,p,p)    # Matrix representation of spike and slab variances
pijk_mc <- matrix( 2 / (p - 1),p,p) # Initial edge inclusion probability matrix
U_mc <- mvtnorm::rmvnorm(K, rep(0,p), H_t )  # Initial latent positions
sig2_mc <- 1                        # Initial tree scale parameter

This code will run 110 Gibbs iterations, then return a posterior sample of size 100 discarding the first 10 burn-in iterations. The number of iterations is only meant to make this a small example, in practice we recommend using around 10,000 iterations, and checking the convergence diagnostics.

burnin <- 10  # Number of burnin-iteration
nmc    <- 100 # Number of MCMC-sample that will be kept

gibbsSample <- treeGibbs(x, delta_mc, zhat_mc, R_mc, v0_mc, tau_mc, pijk_mc, U_mc, sig2_mc,
                         hyperparameters, burnin, nmc, verbose=FALSE, thin=NULL)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
QMPdata		QMPdata
README_files/figure-gfm		README_files/figure-gfm
data		data
functions		functions
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
phyloBCG.Rproj		phyloBCG.Rproj
phyloBCG.zip		phyloBCG.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhyloBCG

Reference:

To run

Simple example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PhyloBCG

Reference:

To run

Simple example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages