Posts tagged “metabolomics”
Metabolomics and the greater sphere of ‘Omic analyses are a burgeoning set tools for investigation of environmental and organismal mechanisms and interactions. Carrying out data analyses within complex biological system contexts is rewarding but also difficult. The following presentation considers components involved in conducting multivariate data analysis, modeling and visualization within biological contexts.
slides: https://www.slideshare.net/dgrapov/complex-systems-biology-informed-data-analysis-and-machine-learning
June 11, 2017 | Categories: Uncategorized | Tags: clustering, data visualization, genomics, lectures, machine learning, metabolomics, network, pathways, proteomics, research, science, software, statistical analysis | Leave a comment
Image
Follow along with the presentation and recreate all the analysis results for yourself.

October 10, 2015 | Categories: Uncategorized | Tags: integration, metabolomics, omics, r-bloggers | 2 Comments

Recently I had the pleasure of giving lecture for the Metabolomics Society on Challenges and Strategies for Next-gen Omic Analyses. You can check out all of my slides and video of the lecture below.
September 19, 2015 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, data integration, dgrapov, genomics, metabolomics, proteomics, r-bloggers, west coast metabolomics center | 4 Comments
I recently had the pleasure of giving a presentation on one of my favorite topics, network mapping, and its application to metabolomic and genomic data integration. You can check out the full presentation below.
November 2, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, data analysis, data visualization, DeviumWeb, genomics, metabolomics, MetaMapR, network mapping, topological data analysis | Leave a comment

Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs including use cases for the freely available data analysis software listed below.
You can check out the introduction lecture to the covered material below.
New additions to the course include lecture and lab on Data normalization and updated and improved software.
Software
Stay tuned for videos of all of the material!

2014 Metabolomics Data Analysis and Visualization Tutorials Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
October 11, 2014 | Categories: Uncategorized | Tags: biochemical network, data analysis, data visualization, demo, DeviumWeb, lectures, metabolomics, MetaMapR, r-bloggers, tutorial, west coast metabolomics center, workshop | Leave a comment
Recently I was lucky enough to publish some of my research findings in the Journal Metabolomics. You can check out the full paper, 10.1007/s11306-014-0706-2, or take a look at the abstract and figures below.
ABSTRACT
Non-obese diabetic (NOD) mice are a widely-used model of type 1 diabetes (T1D). However, not all animals develop overt diabetes. This study examined the circulating metabolomic profiles of NOD mice progressing or not progressing to T1D. Total beta-cell mass was quantified in the intact pancreas using transgenic NOD mice expressing green fluorescent protein under the control of mouse insulin I promoter. While both progressor and non-progressor animals displayed lymphocyte infiltration and endoplasmic reticulum stress in the pancreas tissue, overt T1D did not develop until animals lost ~70 % of the total beta-cell mass. Gas chromatography time of flight mass spectrometry was used to measure >470 circulating metabolites in male and female progressor and non-progressor animals (n = 76) across a wide range of ages (neonates to >40-week). Statistical and multivariate analyses were used to identify age and sex independent metabolic markers which best differentiated progressor and non-progressor animals’ metabolic profiles. Key T1D-associated perturbations were related with: (1) increased plasma glucose and reduced 1,5-anhydroglucitol markers of glycemic control; (2) increased allantoin, gluconic acid and nitric acid-derived saccharic acid markers of oxidative stress; (3) reduced lysine, an insulin secretagogue; (4) increased branched-chain amino acids, isoleucine and valine; (5) reduced unsaturated fatty acids including arachidonic acid; and (6) perturbations in urea cycle intermediates suggesting increased arginine-dependent NO synthesis. Together these findings highlight the strength of the unique approach of comparing progressor and non-progressor NOD mice to identify metabolic perturbations involved in T1D progression.

Fig. 1 Immune cell infiltration and beta-cell destruction in prediabetic NOD mice. A Visualization of spatial islet distribution in the context of the vascular network in the intact pancreas. A prediabetic NOD mouse at 27-week. B The body region of the NOD mouse shown in A. Note that substantial beta-cell destruction is observed in the NOD pancreas (i.e. a loss of GFP-expressing beta-cells). C Intraislet capillary network in the body region of a wild-type mouse at 21-week. D Immunohistochemical staining. Insulin (green), glucagon (red), somatostatin (white) and nuclei (blue). E Hypertrophic islet with massive infiltration of T-lymphocytes. (a) Hematoxylin-Eosin (HE) staining of the islet showing peripheral- and intra-islet infiltrating lymphocytes and remaining endocrine islet cells. (b) A serial section stained for CD4-positive lymphocytes by ABC-staining (brown). c A serial section stained for CD8-positive lymphocytes. F Ultrastructural analysis of hypertrophic islets in non-diabetic and diabetic littermates. (a) Non-diabetic male NOD mouse (41-week old, 4-h fasting BG: 136 mg/dL) showing a hyperactive beta-cell with lymphocyte infiltration and vesicles without dense core granules. (b) Beta-cells in diabetic female NOD mouse (40-week old, 4-h fasting BG: 559 mg/dL) appears to be intact despite the presence of ongoing insulitis. G Progressive degradation of endoplasmic reticulum (ER). (a) Well-developed ER (ER) in a beta-cell undergoing insulitis. (b) ER degradation. Ribosomes are detached (shed) from the ER membrane and are aggregated (ER). Nuclear damage is seen with the formation of foam-like structures (N). Immature granules with less dense cores (G) as well as cytoplasmic liquefaction (CL) are observed. (c) ER membrane breakdown. ER membrane breakdown resulted in aggregation of shed ribosomes (ER). An adjacent PP-cell (PP) appears to be intact (identified by characteristic moderately dense cores of pancreatic polypeptide-containing secretory granules). (d) Beta-cell degradation. ER swelling (ER), ribosome shedding, amorphous cytoplasmic material (R) and cytoplasmic, liquefaction (L) are observed in the same beta-cell

Fig. 2 Progression of autoimmune diabetes in NOD mice. A (a) Virtual slice capture of a whole mouse pancreas from mouse insulin promoter I (MIP)-GFP mice on NOD background. (b) Measured beta-cell/islet distribution. (c) Corresponding 3D scatter plot of islet parameters depicts distribution of islets with various sizes and shapes. Each dot represents a single islet. B (a) Representative data showing islet growth in wild-type mice at 20- and 28-week of age. (b) Examples of beta-cell loss at 20-week (non-diabetic) and 28-week (diabetic) in NOD mice. C Heterogeneous beta-cell loss in NOD mice. Frequency is plotted against islet size. D Three distinct groups in the development of T1D in NOD mice. 3D scatter plot showing the relationship among blood glucose levels (BG), total beta-cell area and age. Three groups of mice are color-coded as diabetic mice (red), young mice with normoglycemia (<25 week; green) and old mice with normoglycemia (25–40 week; blue)

Fig. 3 Biochemical network displaying metabolic differences between diabetic and non-diabetic NOD mice. Metabolites are connected based on biochemical relationships (blue, KEGG RPAIRS) or structural similarity (violet, Tanimoto coefficient ≥0.7). Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders

Fig. 4 Partial correlation network displaying associations between all type 1 diabetes-dependent metabolomic perturbations. All significantly altered metabolites (p adj ≤ 0.05, Supplemental Table S3) are connected based on partial correlations (p adj ≤ 0.05). Edge width displays the absolute magnitude and color the direction (orange positive; blue negative) of the partial-coefficient of correlation. Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains (see Fig. 3 legend), and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders
In conclusion, we identified marked differences in the rates of progression of NOD mice to T1D. Metabolomic analysis was used to identify age and sex independent metabolic markers, which may explain this heterogeneity. Future studies combining metabolic end points (as they correlate with beta-cell mass) and genetic risk profiles will ultimately lead to a more complete understanding of disease onset and progression.
July 30, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, Cytoscape, data analysis, data visualization, metabolomics, NOD mice, type 1 diabetes | Leave a comment
Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network.
Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which aid the process of biological data (metabolomic) network mapping.

DeviumWeb– is a shiny based GUI written in R which is useful for:
- data manipulation, transformation and visualization
- statistical analysis (hypothesis testing, FDR, power analysis, correlations, etc)
- clustering (heiarchical, TODO: k-means, SOM, distribution)
- principal components analysis (PCA)
- orthogonal partial least squares multivariate modeling (O-/PLS/-DA)

MetaMapR– is also a shiny based GUI written in R which is useful for calculation and visualization of various networks including:
- biochemical
- structural similarity
- mass spectral similarity
- correlation
Both of theses projects are under development, and my ultimate goal is to design a one-stop-shop ecosystem for network mapping.
In addition to network mapping,the video above and presentation below also discuss normalization schemes for longitudinal data and genomic, proteomic and metabolomic functional analysis both on a pathway and global level.
As always happy network mapping!

June 27, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, correlation network, Cytoscape, data analysis, data visualization, DeviumWeb, ggplot2, metabolomics, MetaMapR, multivariate, network mapping, O-PLS, R, r-bloggers, shiny, statistical analysis | 6 Comments
I’ve recently participated in the American Society of Mass Spectrommetry (ASMS) conference and had a great time. I met some great people and have a few new ideas for future projects. Specifically giving a go at using self-organizing maps (SOM) and the R package mcclust for clustering alternatives to hierarchical and k-means methods.
I had the pleasure of speaking at the conference in the Informatics-Metabolomics section, and was also a co-author on a project detailing a multi-metabolomics strategy (primary metabolites, lipids, and oxylipins) for the study of type 1 diabetes in an animal model. Keep an eye out for my full talk in an upcoming post.

June 26, 2014 | Categories: Uncategorized | Tags: American Society of Mass Spectrommetry, ASMS, biochemical network, chemical similarity network, conference, data analysis, data visualization, DeviumWeb, metabolomics, MetaMapR, network mapping, networks, self-organizing maps | 1 Comment
The question is: can we automate scientific discovery, and what might an interface to such a tool look like.

I’ve been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered in the presentation below. My ultimate goal is to push forward my current state of fill in the generic template to some kind of interactive and dynamic document generator.
While thinking of a fun way of to present the idea of human-guided data analysis and report generation, I thought of the idea for creating a simple choose your adventure story. I decided to adapt the visualization below into an interactive adventure in R which culminates in the writing of your life story using the magic of knitr.

You can download the story generator, AdventureR, and try it out for yourself. Or take a quick look at some of the possible adventures. Be forewarned some of the story endings are not for the squeamish.

On a practical level, I am in the early prototyping phase of what might a similar application look like for metabolomic data. Currently I am struggling to get beyond the linear workflow and to something more interactive, dynamic and adaptive. Nonetheless its is a beauty to behold when at the click of a button an analysis and report can be generated which would otherwise take many hours to days to manually implement. The challenge is to turn the fill in the template style into an adaptive and robust interface which can quickly guide a human domain expert through the modes of information encoded in the data.

Currently my prototype tools require well defined input templates and flow the data hierarchically down a tree of tasks (e.g. statistics, visualization, functional analysis, clustering), each coming together to generate a mapped network. Right now things are very linear and mostly fill in the template, but still very usefully mimics a set of tasks a bioinformatician might perform. My goal is to adapt the LATEX based text generator into a GUI driven markdown or html based reporting application. With the ultimate goal of increasing the speed and interactivity of the data analysis and interpretation process.

April 5, 2014 | Categories: Uncategorized | Tags: adventurer, automation, data analysis, data visualization, knitr, metabolomics, R, r-bloggers, report generation | Leave a comment
High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used to generate high dimensional and highly informative representations of complex biological (and really any type of) data. Check out the following examples of network mapping in action or view a more indepth presentation of the techniques used below.
Partial correlation network highlighting changes in tumor compared to control tissue from the same patient.

Biochemical and structural similarity network of changes in tumor compared to control tissue from the same patient.

Hierarchical clusters (color) mapped to a biochemical and structural similarity network displaying difference before and after drug administration.

Partial correlation network displaying changes in metabolite relationships in response to drug treatment.
Partial correlation network displaying changes in disease and response to drug treatment.

Check out the full presentation below.

February 22, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, clustering, correlation network, Cytoscape, data analysis, data visualization, Devium, metabolomics, multivariate, network, network mapping, O-PLS-DA, r-bloggers, tutorial | Leave a comment
I recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on:
- data quality overview
- statistical and power analysis
- clustering
- principal components analysis (PCA)
- partial least squares (O-/PLS/-DA)
- metabolite enrichment analysis
- biochemical and structural similarity network construction
- network mapping
I am happy to have taught the course using all open source software, including: R, and Cytoscape. The data analysis and visualization were done using Shiny-based apps: DeviumWeb and MetaMapR. Check out some of the slides below or download all the class material and try it out for yourself.

2014 WCMC LC-MS Data Processing and Statistics for Metabolomics by Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Special thanks to the developers of Shiny and Radiant by Vincent Nijs.
February 17, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, Cytoscape, data analysis, data visualization, Devium, ggplot2, hierarchical clustering, mass spectral similarity, metabolomics, MetaMapR, network, O-PLS, O-PLS-DA, PCA, R, r-bloggers, shiny, TeachingDemos, tutorial | 13 Comments
I am happy to announce the release of MetaMapR (v1.2.0).
New features include:
- An independent module for biological database identifier translations using the Chemical Translation System (CTS)
- a retention time filter for mass spectral connections
- increase in calculation speed
An application of MetaMapR was recently featured in an article in the Nov. 4th 2013 issue of Chemical & Engineering News (C&EN) , 91(44). This tool was used to generate a network of > 1200 metabolites based on enzymatic transformations and structural similarities.

The full article can be found be found here as well as the original image.
December 25, 2013 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, chemical translations, correlation network, data visualization, mass spectral similarity, metabolomics, MetaMapR, network mapping | Leave a comment
I recently gave a presentation of some of my work in network mapping to my research lab. The following covers my progress in the development of my metabolomic network mapping tool MetaMapR, and its application to a variety of data sets including a comparison of normal and malignant lung tissue from the same patient.
November 21, 2013 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, correlation network, Cytoscape, data analysis, data visualization, Gaussian graphical Markov metabolic network, metabolomics, MetaMapR, multivariate, network, network mapping | Leave a comment
The international summer sessions in metabolomics 2013 came to a happy conclusion this past Friday Sept 6th 2013. I had the pleasure of teaching the topics covering metabolomic data analysis. The class was split into lecture and lab sections. The lab section consisted of a hands on data analysis of:
- fresh vs. lyophilized treatment comparison for tomatillo leaf primary metabolomics
- tomatillo vs. pumpkin leaf primary metabolites
The majority of the data analyses were implemented using the open source software imDEV and Devium-web.
Download the FULL LAB. Take a look at the goals folder for each lesson. You can follow along with the lesson plans by looking at each subsections respective excel file (.xlsx). When you are done with a section unhide all the worksheets (right click on a tab at the bottom) to view the solutions .
The lectures, preceding the lab, covered the basics of metabolomic data analysis including:
- Data Quality Overview and Statistical Analysis
- Multivariete Data analysis
September 8, 2013 | Categories: Uncategorized | Tags: ANCOVA, biochemical network, chemical similarity network, Cytoscape, metabolomics, network, O-PLS-DA, PCA, PLS, PLS-DA, summer sessions in metabolomics, tutorial, west coast metabolomics center | Leave a comment
Here are a video and slides for a presentation of mine about my favorite topic :
June 14, 2013 | Categories: Uncategorized | Tags: biochemical network, biochmical network, chemical similarity network, clustering, Cytoscape, data analysis, data visualization, metabolomics, multivariate, network, network mapping, networks, O-PLS, O-PLS-DA, PCA, PLS, PLS-DA | 1 Comment
I’ve posted two new tutorials focused on intermediate and advanced strategies for biological, and specifically metabolomic data analysis (click titles for pdfs).


May 29, 2013 | Categories: Uncategorized | Tags: ANCOVA, chemical similarity network, classification, climate, correlation network, covariate adjustment, data analysis, data visualization, Gaussian graphical Markov metabolic network, imDEV, metabolomics, network, PCA, PLS, PLS-DA, R, research, science, TeachingDemos, tutorial | Leave a comment
Its not uncommon to be faced by multiple questions at the same time. For instance imagine the following experimental design. You have one MAIN question: what is different between groups A and B, but among groups A and B are subgroups 1 and 2. This complicates things because now the answer to the MAIN question (what is different between A and B) may be slightly different for the two sub groups A|1, A|2 and B|1, B|2.


In statistics we can account for these types of experimental designs by choosing different tests. For instance in the case outlined above we could use a two-way analysis of variance (2-way ANOVA) to identify differences between A|B which are independent of differences between 1|2 (and interaction between A|B and 1|2). In the case of multivariate modeling we can achieve a similar effect by using covariate adjustments. For example we can use the residuals from a simple linear model for differences between 1|2 as the 1|2-effect adjusted data to be used to test for differences between A|B. Here is a visual example of this approach using:
2) 1|2–adjusted PLS-DA model for A|B1) PCA to evaluate the data variance between A and B (GREEN and RED) and 1 and 2 (SMALL or LARGE)
3) 1|2–adjusted O-PLS-DA model for A|B
Based on the PCA we see that the differences between A|B are also affected by 1|2. This is evident in distribution of scores based on LARGE|SMALL among A ( A|1 (GREEN|SMALL) is more different (further right) from all B than A|2 (GREEN|LARGE). The same can be said for B, and in particular the greatest differences between all groups is between those which have the greatest separation in the X-axis (1st principal component) which are RED|LARGE and GREEN|SMALL.
To identify the greatest difference between RED|GREEN which is independent of differences due to SMALL|LARGE, we can use a SMALL|LARGE -adjusted data to create a PLS-DA model to discriminate between RED|GREEN.
This projection of the differences between A|B is the same for SMALL|LARGE groups. Ideally we want the two groups scores to be maximally separated in the X-axis or 1st LV. We see that this is not the case above, and instead the explanation of how the variables contribute to differences between GREEN|RED needs to be answered by explaining scores variance in X and Y axes or two dimensions.
Next we try the O-PLS-DA algorithm, which aims to rotate the projection of the data to maximize the separation between GREEN|RED on the X-axis and capture unrelated or orthogonal variance on the Y-axis.
The O-PLS-DA model loadings for the 1st LV provide information regarding differences in variable magnitudes between the two groups (GREEN|RED).
We can use network mapping to visualize these weights within a domain specific context. In the case of metabolomics data this is best achieved using biochemical/chemical similarity networks.
We can create these networks by assigning edges between vertices (representing metabolites) based on biochemical relationships (KEGG RPAIRs ) or chemical similarities (Tanimoto coefficient >0.7). We can then map the O-PLS-DA model loadings to this network’s visual properties (vertex: size, color, border, and inset graphic).

For example we can map vertex size to the matabolite’s importance in the explained discrimination between groups (loading on O-PLS-DA LV 1) and color the direction of change (blue, decrease; red, increase). Metabolites displaying significant differences between RED and GREEN groups (two-way ANOVA, p < 0.05 adjusting for 1|2) are shown at maximum size, with a black border and contain a box-plot visualization.
Here is network mapping the O-PLS-DA model loadings into a biological context and displaying graphs for import parameters means among groups stratified by A|B and 1|2 (left to right: A|1, A|2,B|1,B|2).

Here is another network with the same edge and vertex properties as above, except the inset graphs show differences between groups A|B adjusted for the effect of 1|2.

February 9, 2013 | Categories: Uncategorized | Tags: ANOVA, chemical similarity network, covariate adjustment, Cytoscape, ExCytR, imDEV, metabolomics, O-PLS, O-PLS-DA, PCA, PLS-DA | Leave a comment
February 1, 2013 | Categories: Uncategorized | Tags: chemical similarity network, Cytoscape, data visualization, Devium, imDEV, metabolomics, network, OGTT, PCA, PLS, R, statistics | Leave a comment
The chemical similarity network or CSN is a great tool for organizing biological data based on known biochemistry or chemical structural similarity. Here is an example CSN for visualizing metabolomic changes (measured via GC/TOF) due to anaerobic stress in germinating seeds.

In this network edges are formed for chemical similarity scores > 75. Node color describes significant (adjusted p-value < 0.05, q-value = 0.05, paired t-Test) increase (red), decrease (blue) or no change (gray) in anaerobic relative to aerobic treatments. Node size is inversely proportional to the tests p-value.
This CSN was not hard to construct and minimally requires knowledge of analyte PubChem chemical identifiers (CIDs). CIDs can be used to calculate the chemical similarity matrix using online tools provided by PubChem. This symmetric matrix can be easily formatted to create an edge list containing the basic information: source, target and similarity score.

Here is a function for converting square symmetric matrices to edge lists using the R statistical programming environment.
mat.to.edge.list<-function(mat)
{
#accessory function
all.pairs<-function(r,type="one")
{
switch(type,
one = list(first = rep(1:r,rep(r,r))[lower.tri(diag(r))],
second = rep(1:r, r)[lower.tri(diag(r))]),
two = list(first = rep(1:r, r)[lower.tri(diag(r))],
second = rep(1:r,rep(r,r))[lower.tri(diag(r))]))
ids<-all.pairs(ncol(mat))
tmp<-as.data.frame(do.call("rbind",lapply(1:length(ids$first) ,function(i)
{
value<-mat[ids$first[i],ids$second[i]]
name<-c(colnames(mat)[ids$first[i]],colnames(mat)[ids$secon d[i]])
c(name,value)
})))
colnames(tmp)<-c("source","target","value")
return(tmp)
}
The function mat.to.edge.list will convert a square symmetric matrix to an edge list through the extraction of the upper triangle excluding the diagonal or self edges.
This edge list can now be visualized as a CSN using some software (see brief instructions here). I prefer to use Cytoscape for this. The edge list merely contains instructions for which vertices or nodes representing metabolites should be connected.

An additional node annotation or attribute table can also be imported into Cytoscape and used to alter the node properties based on statistical results.
December 31, 2012 | Categories: Uncategorized | Tags: chemical similarity network, Cytoscape, ExCytR, metabolomics, network, R | Leave a comment
Primary metabolites in human serum or urine.
Oh oh, there seem to be some outliers: serum samples looking like urine and vice versa. Fix these and evaluate using PCA and hierarchical clustering on rank correlations.

Now things look more believable. Next let us test the effects of data pre-treatment on PLS-DA model scores for a 3 group comparison in serum. Ideally group scores would be maximally resolved in the dimension of the first latent variable (x) and inter-group variance would be orthogonal or in the y-axis.

Compared to raw data (TOP) where ~ 3 top variables (glucose, urea and mannitol) dominate the variance structure, the autoscaled model, due to variable-wise mean subtraction and division by the standard deviation, displays a more balanced contribution to scores variance by variables. The larger separation between WHITE and RED class scores along the x-axis suggest improved classifier performance over raw data model and overview of samples with scores outside their respective group’s Hotelling’s T ellipse (95%) might point to a sample outlier to further investigate or potentially exclude from the current test.
December 16, 2012 | Categories: Uncategorized | Tags: autoscaling, clustering, imDEV, metabolomics, normalizations, outliers, PCA, PLS-DA | Leave a comment
The concept is to make a GUI to provide a static and dynamic linking between data and its network representations.
Static access will involve making networks based on data and metadata stored in some table or spreadsheet.
Dynamic control will provide interactive access to network construction and annotation properties.
Together, these will provide rapid generation of information rich networks, based on tests of internal data properties or from exogenous semantic knowledge. Here is an example of a network representation of a time course metabolomic experiment. This network is used to encode dependence between top parameters of a PLS-DA model discriminating between pre- and post-experimental interventions. Larger nodes show variables meeting the 5% significance cut off (p < 0.05) for a mixed effects model to identify intervention related differences between unbalanced baseline and area under the curve for metabolite excursion measurements during an oral glucose tolerance test (OGTT). Node color signifies increase (red) or decrease (blue) in post- relative to pre-intervention average values. Node shape and outline display metabolite classification and presence in a PLS-DA model respectively. Node graphs, created in ggplots2, show box plots for pre- (red) and post-intervention (green) class distribution medians, upper and lower quartiles, and outliers.

The interactions between model parameters which exist only in pre-intervention samples are shown in the network below.
Connections are made between metabolites which have a non-zero partial correlation extracted based on a qpnetwork trimmed at a threshold where node and edge number is ~equal. In this network all edges meet the 5% significance based on tests of persons correlations.

December 1, 2012 | Categories: Uncategorized | Tags: Cytoscape, ExCytR, metabolomics, network, qpgraph, R | Leave a comment