Posts tagged “data visualization”

Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com

The following presentation also shows a sneak peak of a new data analysis visualization software, DAVe: Data Analysis and Visualization engine. Check out some early features. DAVe is built in R and seeks to support a seamless environment for advanced data analysis and machine learning tasks and biological functional and network analysis.
As an aside, building the main site (in progress) was a fun opportunity to experiment with Jekyll, Ruby and embedding slick interactive canvas elements into websites. You can checkout all the code here https://github.com/dgrapov/CDS_jekyll_site.
slides: https://www.slideshare.net/dgrapov/machine-learning-powered-metabolomic-network-analysis
June 11, 2017 | Categories: Uncategorized | Tags: clustering, data analysis, data visualization, genomics, machine learning, network, pathways, proteomics, R, r-bloggers, science, shiny, software, statistics | Leave a comment
Metabolomics and the greater sphere of ‘Omic analyses are a burgeoning set tools for investigation of environmental and organismal mechanisms and interactions. Carrying out data analyses within complex biological system contexts is rewarding but also difficult. The following presentation considers components involved in conducting multivariate data analysis, modeling and visualization within biological contexts.
slides: https://www.slideshare.net/dgrapov/complex-systems-biology-informed-data-analysis-and-machine-learning
June 11, 2017 | Categories: Uncategorized | Tags: clustering, data visualization, genomics, lectures, machine learning, metabolomics, network, pathways, proteomics, research, science, software, statistical analysis | Leave a comment
What is the highest dimensional visualization you can think of? Now imagine it being interactive. The following details a Frankenstein visualization packing a smorgasbord of multivariate goodness.

Enter first, self-organizing maps (SOM). I first fell into a love dream with SOMs after using the kohonen package. The wines data set example is a beautiful display of information.

Eloquently, making the visualization above is relatively easy. SOM is used to organize the data into related groups on a grid. Hierarchical cluster analysis (HCA) is used to classify the SOM codes into three groups.

HCA cluster information is mapped to the SOM grid using hexagon background colors. The radial bar plots show the variable (wine compounds’) patterns for samples (wines).

The goal for this project was to reproduce the kohonen.plot using ggplot2 and make it interactive using shiny.

The main idea was to use SOM to calculated the grid coordinates, geom_hexagon for the grid packing and any ggplot for the hexagon-inset sub plots. Some basic inset plots could be bar or line plots.
Part of the beauty is the organization of any ggplot you can think of (optionally grouping the input data or SOM codes) based on the SOM unit classification.
A Pavlovian response might be; does it network?

Yes we can (network). Above is an example of different correlation patterns between wine components in related groups of wines. For example the green grid points identify wines showing a correlation between phenols and flavanoids (probably reds?). Their distance from each other could be explained (?) by the small grid size (see below).
The next question might be, does it scale?

There is potential. The 4 x 4 grid shows radial bar plot patterns for 16 sub groups among the 3 larger sample groups. The next next 6 x 6 plot shows wine compound profiles for 36 ~related subsets of wines.
A useful side effect is that we can use SOM quality metrics to give us an extra-dimensional view into tuning the visualization. For example we can visualize the number of samples per grid point or distances between grid points (dissimilarity in patterns).
This is useful to identify parts of the somClustPlot showing the number of mapped samples and greatest differences.
One problem I experienced was getting the hexagon packing just right. I ended making controls to move the hexagons ~up/down and zoom in/out on the plot. It is not perfect but shows potential (?) for scaffolding highly multivariate visualizations? Some of my other concerns include the stochastic nature of SOM and the need for som random initialization for the embedding. Make sure to use it with set.seed() to make it reproducible, and might want to try a few seeds. Maybe someone out there knows how to make this aspect of SOM more robust?
May 19, 2016 | Categories: Uncategorized | Tags: data visualization, ggplot2, multivariate, network, r-bloggers, SOM | 3 Comments
R users: networkly: network visualization in R using Plotly
In addition to their more common uses, networks can be used as powerful multivariate data visualizations and exploration tools. Networks not only provide mathematical representations of data but are also one of the few data visualization methods capable of easily displaying multivariate variable relationships. The process of network mapping involves using the network manifold to display a variety of other information e.g. statistical, machine learning or functional analysis results (see more mapped network examples).

The combination of Plotly and Shiny is awesome for creating your very own network mapping tools. Networkly is an R package which can be used to create 2-D and 3-D interactive networks which are rendered with plotly and can be easily integrated into shiny apps or markdown documents. All you need to get started is an edge list and node attributes which can then be used to generate interactive 2-D and 3-D networks with customizable edge (color, width, hover, etc) and node (color, size, hover, label, etc) properties.
2-Dimensional Network (interactive version)
3-Dimensional Network (interactive version)

View all code used to generate the networks above.
February 28, 2016 | Categories: Uncategorized | Tags: data analysis, data visualization, network, network mapping, networkly, plotly, R, r-bloggers, shiny | Leave a comment
I recently had the pleasure of giving a presentation on one of my favorite topics, network mapping, and its application to metabolomic and genomic data integration. You can check out the full presentation below.
November 2, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, data analysis, data visualization, DeviumWeb, genomics, metabolomics, MetaMapR, network mapping, topological data analysis | Leave a comment

Recently I had the pleasure of teaching statistical and multivariate data analysis and visualization at the annual Summer Sessions in Metabolomics 2014, organized by the NIH West Coast Metabolomics Center.
Similar to last year, I’ve posted all the content (lectures, labs and software) for any one to follow along with at their own pace. I also plan to release videos for all the lectures and labs including use cases for the freely available data analysis software listed below.
You can check out the introduction lecture to the covered material below.
New additions to the course include lecture and lab on Data normalization and updated and improved software.
Software
Stay tuned for videos of all of the material!

2014 Metabolomics Data Analysis and Visualization Tutorials Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
October 11, 2014 | Categories: Uncategorized | Tags: biochemical network, data analysis, data visualization, demo, DeviumWeb, lectures, metabolomics, MetaMapR, r-bloggers, tutorial, west coast metabolomics center, workshop | Leave a comment
Recently I had the pleasure of teaching data analysis at the 2014 UC Davis Proteomics Workshop. This included a hands on lab for making gene ontology enrichment networks. You can check out my lecture and tutorial below or download all the material.
Introduction
Tutorial

2014 UC Davis Proteomics Workshop Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
August 9, 2014 | Categories: Uncategorized | Tags: correlation network, Cytoscape, data analysis, data visualization, enrichment, gene ontology, multivariate, network, network enrichment, PCA, PLS, proteomics, r-bloggers, tutorial | Leave a comment
Recently I was lucky enough to publish some of my research findings in the Journal Metabolomics. You can check out the full paper, 10.1007/s11306-014-0706-2, or take a look at the abstract and figures below.
ABSTRACT
Non-obese diabetic (NOD) mice are a widely-used model of type 1 diabetes (T1D). However, not all animals develop overt diabetes. This study examined the circulating metabolomic profiles of NOD mice progressing or not progressing to T1D. Total beta-cell mass was quantified in the intact pancreas using transgenic NOD mice expressing green fluorescent protein under the control of mouse insulin I promoter. While both progressor and non-progressor animals displayed lymphocyte infiltration and endoplasmic reticulum stress in the pancreas tissue, overt T1D did not develop until animals lost ~70 % of the total beta-cell mass. Gas chromatography time of flight mass spectrometry was used to measure >470 circulating metabolites in male and female progressor and non-progressor animals (n = 76) across a wide range of ages (neonates to >40-week). Statistical and multivariate analyses were used to identify age and sex independent metabolic markers which best differentiated progressor and non-progressor animals’ metabolic profiles. Key T1D-associated perturbations were related with: (1) increased plasma glucose and reduced 1,5-anhydroglucitol markers of glycemic control; (2) increased allantoin, gluconic acid and nitric acid-derived saccharic acid markers of oxidative stress; (3) reduced lysine, an insulin secretagogue; (4) increased branched-chain amino acids, isoleucine and valine; (5) reduced unsaturated fatty acids including arachidonic acid; and (6) perturbations in urea cycle intermediates suggesting increased arginine-dependent NO synthesis. Together these findings highlight the strength of the unique approach of comparing progressor and non-progressor NOD mice to identify metabolic perturbations involved in T1D progression.

Fig. 1 Immune cell infiltration and beta-cell destruction in prediabetic NOD mice. A Visualization of spatial islet distribution in the context of the vascular network in the intact pancreas. A prediabetic NOD mouse at 27-week. B The body region of the NOD mouse shown in A. Note that substantial beta-cell destruction is observed in the NOD pancreas (i.e. a loss of GFP-expressing beta-cells). C Intraislet capillary network in the body region of a wild-type mouse at 21-week. D Immunohistochemical staining. Insulin (green), glucagon (red), somatostatin (white) and nuclei (blue). E Hypertrophic islet with massive infiltration of T-lymphocytes. (a) Hematoxylin-Eosin (HE) staining of the islet showing peripheral- and intra-islet infiltrating lymphocytes and remaining endocrine islet cells. (b) A serial section stained for CD4-positive lymphocytes by ABC-staining (brown). c A serial section stained for CD8-positive lymphocytes. F Ultrastructural analysis of hypertrophic islets in non-diabetic and diabetic littermates. (a) Non-diabetic male NOD mouse (41-week old, 4-h fasting BG: 136 mg/dL) showing a hyperactive beta-cell with lymphocyte infiltration and vesicles without dense core granules. (b) Beta-cells in diabetic female NOD mouse (40-week old, 4-h fasting BG: 559 mg/dL) appears to be intact despite the presence of ongoing insulitis. G Progressive degradation of endoplasmic reticulum (ER). (a) Well-developed ER (ER) in a beta-cell undergoing insulitis. (b) ER degradation. Ribosomes are detached (shed) from the ER membrane and are aggregated (ER). Nuclear damage is seen with the formation of foam-like structures (N). Immature granules with less dense cores (G) as well as cytoplasmic liquefaction (CL) are observed. (c) ER membrane breakdown. ER membrane breakdown resulted in aggregation of shed ribosomes (ER). An adjacent PP-cell (PP) appears to be intact (identified by characteristic moderately dense cores of pancreatic polypeptide-containing secretory granules). (d) Beta-cell degradation. ER swelling (ER), ribosome shedding, amorphous cytoplasmic material (R) and cytoplasmic, liquefaction (L) are observed in the same beta-cell

Fig. 2 Progression of autoimmune diabetes in NOD mice. A (a) Virtual slice capture of a whole mouse pancreas from mouse insulin promoter I (MIP)-GFP mice on NOD background. (b) Measured beta-cell/islet distribution. (c) Corresponding 3D scatter plot of islet parameters depicts distribution of islets with various sizes and shapes. Each dot represents a single islet. B (a) Representative data showing islet growth in wild-type mice at 20- and 28-week of age. (b) Examples of beta-cell loss at 20-week (non-diabetic) and 28-week (diabetic) in NOD mice. C Heterogeneous beta-cell loss in NOD mice. Frequency is plotted against islet size. D Three distinct groups in the development of T1D in NOD mice. 3D scatter plot showing the relationship among blood glucose levels (BG), total beta-cell area and age. Three groups of mice are color-coded as diabetic mice (red), young mice with normoglycemia (<25 week; green) and old mice with normoglycemia (25–40 week; blue)

Fig. 3 Biochemical network displaying metabolic differences between diabetic and non-diabetic NOD mice. Metabolites are connected based on biochemical relationships (blue, KEGG RPAIRS) or structural similarity (violet, Tanimoto coefficient ≥0.7). Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders

Fig. 4 Partial correlation network displaying associations between all type 1 diabetes-dependent metabolomic perturbations. All significantly altered metabolites (p adj ≤ 0.05, Supplemental Table S3) are connected based on partial correlations (p adj ≤ 0.05). Edge width displays the absolute magnitude and color the direction (orange positive; blue negative) of the partial-coefficient of correlation. Metabolite size and color represent the importance (O-PLS-DA model loadings, LV 1) and relative change (gray p adj > 0.05; green increase; red decrease) in diabetic compared non-diabetic NOD mice. Shapes display metabolites’ molecular classes or biochemical sub-domains (see Fig. 3 legend), and top descriptors of T1D-associated metabolic perturbations (Table 1) are highlighted with thick black borders
In conclusion, we identified marked differences in the rates of progression of NOD mice to T1D. Metabolomic analysis was used to identify age and sex independent metabolic markers, which may explain this heterogeneity. Future studies combining metabolic end points (as they correlate with beta-cell mass) and genetic risk profiles will ultimately lead to a more complete understanding of disease onset and progression.
July 30, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, Cytoscape, data analysis, data visualization, metabolomics, NOD mice, type 1 diabetes | Leave a comment
Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network.
Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which aid the process of biological data (metabolomic) network mapping.

DeviumWeb– is a shiny based GUI written in R which is useful for:
- data manipulation, transformation and visualization
- statistical analysis (hypothesis testing, FDR, power analysis, correlations, etc)
- clustering (heiarchical, TODO: k-means, SOM, distribution)
- principal components analysis (PCA)
- orthogonal partial least squares multivariate modeling (O-/PLS/-DA)

MetaMapR– is also a shiny based GUI written in R which is useful for calculation and visualization of various networks including:
- biochemical
- structural similarity
- mass spectral similarity
- correlation
Both of theses projects are under development, and my ultimate goal is to design a one-stop-shop ecosystem for network mapping.
In addition to network mapping,the video above and presentation below also discuss normalization schemes for longitudinal data and genomic, proteomic and metabolomic functional analysis both on a pathway and global level.
As always happy network mapping!

June 27, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, correlation network, Cytoscape, data analysis, data visualization, DeviumWeb, ggplot2, metabolomics, MetaMapR, multivariate, network mapping, O-PLS, R, r-bloggers, shiny, statistical analysis | 6 Comments
I’ve recently participated in the American Society of Mass Spectrommetry (ASMS) conference and had a great time. I met some great people and have a few new ideas for future projects. Specifically giving a go at using self-organizing maps (SOM) and the R package mcclust for clustering alternatives to hierarchical and k-means methods.
I had the pleasure of speaking at the conference in the Informatics-Metabolomics section, and was also a co-author on a project detailing a multi-metabolomics strategy (primary metabolites, lipids, and oxylipins) for the study of type 1 diabetes in an animal model. Keep an eye out for my full talk in an upcoming post.

June 26, 2014 | Categories: Uncategorized | Tags: American Society of Mass Spectrommetry, ASMS, biochemical network, chemical similarity network, conference, data analysis, data visualization, DeviumWeb, metabolomics, MetaMapR, network mapping, networks, self-organizing maps | 1 Comment
Recently I was tasked with evaluating and most importantly removing analytical variance form a longitudinal metabolomic analysis carried out over a few years and including >2,5000 measurements for >5,000 patients. Even using state-of-the-art analytical instruments and techniques long term biological studies are plagued with unwanted trends which are unrelated to the original experimental design and stem from analytical sources of variance (added noise by the process of measurement). Below is an example of a metabolomic measurement with and without analytical variance.

The noise pattern can be estimated based on replicated measurements of quality control samples embedded at a ratio of 1:10 within the larger experimental design. The process of data normalization is used to remove analytical noise from biological signal on a variable specific basis. At the bottom of this post, you can find an in-depth presentation of how data quality can be estimated and a comparison of many common data normalization approaches. From my analysis I concluded that a relatively simple LOESS normalization is a very powerful method for removal of analytical variance. While LOESS (or LOWESS), locally weighted scatterplot smoothing, is a relatively simple approach to implement; great care has to be taken when optimizing each variable-specific model.
In particular, the span parameter or alpha controls the degree of smoothing and is a major determinant if the model (calculated from repeated measures) is underfit, just right or overfit with regards to correcting analytical noise in samples. Below is a visualization of the effect of the span parameter on the model fit.

One method to estimate the appropriate span parameter is to use cross-validation with quality control samples. Having identified an appropriate span, a LOESS model can be generated from repeated measures data (black points) and is used to remove the analytical noise from all samples (red points).

Having done this we can now evaluate the effect of removing analytical noise from quality control samples (QCs, training data, black points above) and samples (test data, red points) by calculating the relative standard deviation of the measured variable (standard deviation/mean *100). In the case of the single analyte, ornithine, we can see (above) that the LOESS normalization will reduce the overall analytical noise to a large degree. However we can not expect that the performance for the training data (noise only) will converge with that of the test set, which contains both noise and true biological signal.
In addition to evaluating the normalization specific removal of analytical noise on a univariate level we can also use principal components analysis (PCA) to evaluate this for all variables simultaneously. Below is an example of the PCA scores for non-normalized and LOESS normalized data.

We can clearly see that the two largest modes of variance in the raw data explain differences in when the samples were analyzed, which is termed batch effects. Batch effects can mask true biological variability, and one goal of normalizations is to remove them, which we can see is accomplished in the LOESS normalized data (above right).
However be forewarned, proper model validation is critical to avoiding over-fitting and producing complete nonsense.

In case you are interested the full analysis and presentation can be found below as well as the majority of the R code used for the analysis and visualizations.

June 4, 2014 | Categories: Uncategorized | Tags: batch effects, data analysis, data quality, data visualization, Devium, ggplot2, normalizations, PCA, quality controls, r-bloggers, tutorial | Leave a comment
The question is: can we automate scientific discovery, and what might an interface to such a tool look like.

I’ve been experimenting with automating simple and complex data analysis and report generation tasks for biological data and mostly using R and LATEX. You can see some of my progress and challenges encountered in the presentation below. My ultimate goal is to push forward my current state of fill in the generic template to some kind of interactive and dynamic document generator.
While thinking of a fun way of to present the idea of human-guided data analysis and report generation, I thought of the idea for creating a simple choose your adventure story. I decided to adapt the visualization below into an interactive adventure in R which culminates in the writing of your life story using the magic of knitr.

You can download the story generator, AdventureR, and try it out for yourself. Or take a quick look at some of the possible adventures. Be forewarned some of the story endings are not for the squeamish.

On a practical level, I am in the early prototyping phase of what might a similar application look like for metabolomic data. Currently I am struggling to get beyond the linear workflow and to something more interactive, dynamic and adaptive. Nonetheless its is a beauty to behold when at the click of a button an analysis and report can be generated which would otherwise take many hours to days to manually implement. The challenge is to turn the fill in the template style into an adaptive and robust interface which can quickly guide a human domain expert through the modes of information encoded in the data.

Currently my prototype tools require well defined input templates and flow the data hierarchically down a tree of tasks (e.g. statistics, visualization, functional analysis, clustering), each coming together to generate a mapped network. Right now things are very linear and mostly fill in the template, but still very usefully mimics a set of tasks a bioinformatician might perform. My goal is to adapt the LATEX based text generator into a GUI driven markdown or html based reporting application. With the ultimate goal of increasing the speed and interactivity of the data analysis and interpretation process.

April 5, 2014 | Categories: Uncategorized | Tags: adventurer, automation, data analysis, data visualization, knitr, metabolomics, R, r-bloggers, report generation | Leave a comment
High dimensional biological data shares many qualities with other forms of data. Typically it is wide (samples << variables), complicated by experiential design and made up of complex relationships driven by both biological and analytical sources of variance. Luckily the powerful combination of R, Cytoscape (< v3) and the R package RCytoscape can be used to generate high dimensional and highly informative representations of complex biological (and really any type of) data. Check out the following examples of network mapping in action or view a more indepth presentation of the techniques used below.
Partial correlation network highlighting changes in tumor compared to control tissue from the same patient.

Biochemical and structural similarity network of changes in tumor compared to control tissue from the same patient.

Hierarchical clusters (color) mapped to a biochemical and structural similarity network displaying difference before and after drug administration.

Partial correlation network displaying changes in metabolite relationships in response to drug treatment.
Partial correlation network displaying changes in disease and response to drug treatment.

Check out the full presentation below.

February 22, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, clustering, correlation network, Cytoscape, data analysis, data visualization, Devium, metabolomics, multivariate, network, network mapping, O-PLS-DA, r-bloggers, tutorial | Leave a comment
I recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on:
- data quality overview
- statistical and power analysis
- clustering
- principal components analysis (PCA)
- partial least squares (O-/PLS/-DA)
- metabolite enrichment analysis
- biochemical and structural similarity network construction
- network mapping
I am happy to have taught the course using all open source software, including: R, and Cytoscape. The data analysis and visualization were done using Shiny-based apps: DeviumWeb and MetaMapR. Check out some of the slides below or download all the class material and try it out for yourself.

2014 WCMC LC-MS Data Processing and Statistics for Metabolomics by Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Special thanks to the developers of Shiny and Radiant by Vincent Nijs.
February 17, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, Cytoscape, data analysis, data visualization, Devium, ggplot2, hierarchical clustering, mass spectral similarity, metabolomics, MetaMapR, network, O-PLS, O-PLS-DA, PCA, R, r-bloggers, shiny, TeachingDemos, tutorial | 13 Comments
Network mapping is a high-dimensional data visualization technique which can be applied to virtually any type of data. I recently gave a tutorial on the basics of network mapping where each participants generated a mapped network for their name.
Download the full tutorial at TeachingDemos, and then follow along with the tutorial at your own pace.
Happy network mapping!
January 31, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, data visualization, MetaMapR, network mapping, networks, TeachingDemos, tutorial | Leave a comment
I am happy to announce the release of MetaMapR (v1.2.0).
New features include:
- An independent module for biological database identifier translations using the Chemical Translation System (CTS)
- a retention time filter for mass spectral connections
- increase in calculation speed
An application of MetaMapR was recently featured in an article in the Nov. 4th 2013 issue of Chemical & Engineering News (C&EN) , 91(44). This tool was used to generate a network of > 1200 metabolites based on enzymatic transformations and structural similarities.

The full article can be found be found here as well as the original image.
December 25, 2013 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, chemical translations, correlation network, data visualization, mass spectral similarity, metabolomics, MetaMapR, network mapping | Leave a comment
I recently gave a presentation of some of my work in network mapping to my research lab. The following covers my progress in the development of my metabolomic network mapping tool MetaMapR, and its application to a variety of data sets including a comparison of normal and malignant lung tissue from the same patient.
November 21, 2013 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, correlation network, Cytoscape, data analysis, data visualization, Gaussian graphical Markov metabolic network, metabolomics, MetaMapR, multivariate, network, network mapping | Leave a comment

After being busy the last two weeks teaching and attending academic conferences, I finally found some time to do what I love, program data visualizations using R. After being interested in Shiny for a while, I finally decided to pull the trigger and build my first Shiny app!
I wanted to make a proof of concept app which contained the following dynamics which are the basics of any UI design:
1) dynamic UI options
2) dynamically updated plot based on UI inputs
Here is what I came up with.

Check out the app for yourself or the R code HERE.
library(shiny)
runGist('5792778')
The app consists of a user interface (UI) for selecting the data, variable to plot , grouping factor for colors and four plotting options: boxplot (above), histogram, density plot and bar graph. As an added bonus the user can select to show or hide jittered points in the boxplot visualization.
Generally #2 above was well described and easy to implement, but it took a lot of trial and error to figure out how to implement #1. Basically to generate dynamic UI objects, the UI objects need to be called using the function shiny:::uiOutput() in the ui.R file and their arguments set in the server.R file using the function shiny:::renderUI(). After getting this to work everything else fell in place.
Having some experience with making UI’s in VBA (visual basic) and gWidgets; Shiny is a joy to work with once you understand some of its inner workings. One aspect I felt which made the learning experience frustrating was the lack of informative errors coming from Shiny functions. Even using all the R debugging tools having Shiny constantly tell me something was not correctly called from a reactive environment or the error was in the runApp() did not really help. My advice to anyone learning Shiny is to take a look at the tutorials, and particularly the section on Dynamic UI. Then pick a small example to reverse engineer. Don’t start off too complicated else you will have a hard time understanding which sections of code are not working as expected.
Finally here are some screen shots, and keep an eye out for more advanced shiny apps in the near future.

June 16, 2013 | Categories: Uncategorized | Tags: bar graph, boxplot, data visualization, density plot, ggplot2, histogram, R, r-bloggers, shiny | 7 Comments
Here are a video and slides for a presentation of mine about my favorite topic :
June 14, 2013 | Categories: Uncategorized | Tags: biochemical network, biochmical network, chemical similarity network, clustering, Cytoscape, data analysis, data visualization, metabolomics, multivariate, network, network mapping, networks, O-PLS, O-PLS-DA, PCA, PLS, PLS-DA | 1 Comment
I’ve posted two new tutorials focused on intermediate and advanced strategies for biological, and specifically metabolomic data analysis (click titles for pdfs).


May 29, 2013 | Categories: Uncategorized | Tags: ANCOVA, chemical similarity network, classification, climate, correlation network, covariate adjustment, data analysis, data visualization, Gaussian graphical Markov metabolic network, imDEV, metabolomics, network, PCA, PLS, PLS-DA, R, research, science, TeachingDemos, tutorial | Leave a comment
The idea is that we have collected information about 30 samples at 4 intervals for 200 variables. This makes 30 * 4 * 200 = 24,000 data points!
That is a lot to keep track of if we want to start the data analysis by looking at sample-wise (30) differences in variables (200) which are also dependent on time (4).
One idea is to use orthogonal signal correction partial least squares (O-PLS) to ask the question:
1) what is the most conserved linear ordering of my data based on
2) description of my data = 3 (group)s of samples at 4 (points in time) and the starting point or t= 0 (so a total of 5 points in time).
Here is an example O-PLS scores plot for the samples (30*5 = 150 ) with polygons around the boundaries of each unique sample-group classification ( 3 * 5 = 15).

We can try to summarize the position of each group in this multivariate space (15 * 200) by plotting each groups median score and standard error for the first two O-PLS latent variables (LVs).

Above is an enticing representation of the time-course differences between 3 groups of samples for 5 time measurement points (t= 0, 30, 60, 90 and 120 minutes). Now that we have established how our samples look based on 200 measurements or variables we can examine the variable loadings for this model.

Above the loadings or relative contribution of each variable to the description of the samples is plotted for O-PLS LV1 and 2. Based on the position of the variables in the x-axis (LV1) we can say something about their relative changes in time (because O-PLS samples scores are also distributed in the x-axis with respect with time), and the variable LV2 loading (y-axis) can be used to describe changes/differences between the groups (note sample group classification pattern in the y-axis (LV2) which is independent of the change in time (x-axis, LV1).

Above we can visualize a how the sample and variable descriptions are related. For instance variables far left in the loadings (FA) start out relatively increased and then decrease as samples position increases to the right. Analogously as time increases there is an increase in the majority of variables (note the large cloud of loadings on LV1 (x-axis above)).
Another interesting thing to try is to visualize the change in groups scores which are independent of time = 0 (subtract t=0 abundance for 200 variables from t = 30, 60, 90 and 120 minute time-points on a sample-wise basis).

Above are a baseline (t= 0) normalized changes in time (above left, point color) for three groups of samples (above left, point shape). As before we can study the relationship between samples and variables on a multivariate basis by comparing the samples scores (position in LV 1 and LV2) to variable loadings.
This process (O-PLS) can be helpful for ranking the original 200 variables in two dimensions (2 lists)
1) with respect to change with time (x-axis)
2) difference between groups (y-axis).
It is interesting to note that without baseline adjustment, the group young NGT has the lowest starting FA (group scores at t= 0 are to the right of the other two groups). The relative differences between group t =0 and t = 120 positions can be used to visualize the change in FA over time (decrease, note negative loading in LV1 ).
Finally we can try to connect our multivariate observations with the easily interpretable visualizations of a single variable ( FA baseline adjusted), as a box plot representing the medians (horizontal line center of box plot) and 25-75th qantiles (rectangle top and bottom boundaries ) for the 3 groups over 4 time points.

The box plot visualization above captures a similar trend in the relative position in groups as the one we previously described using all 200 variables. This make sense given the extreme loading observed for FA, and therefore the implied contribution (influence) of this variable on the observed distribution of the sample scores.
May 11, 2013 | Categories: Uncategorized | Tags: box plot, data analysis, data visualization, latent variables, loadings plot, O-PLS, O-PLS-DA, partial least squares, R, scores plot, time-course | Leave a comment
February 1, 2013 | Categories: Uncategorized | Tags: chemical similarity network, Cytoscape, data visualization, Devium, imDEV, metabolomics, network, OGTT, PCA, PLS, R, statistics | Leave a comment