Machine Learning Powered Biological Network Analysis

Video

dave_data

Metabolomic network analysis can be used to interpret experimental results within a variety of contexts including: biochemical relationships, structural and spectral similarity and empirical correlation. Machine learning is useful for modeling relationships in the context of pattern recognition, clustering, classification and regression based predictive modeling. The combination of developed metabolomic networks and machine learning based predictive models offer a unique method to visualize empirical relationships while testing key experimental hypotheses. The following presentation focuses on data analysis, visualization, machine learning and network mapping approaches used to create richly mapped metabolomic networks. Learn more at www.createdatasol.com

dave

The following presentation also shows a sneak peak of a new data analysis visualization software, DAVe: Data Analysis and Visualization engine. Check out some early features. DAVe is built in R and seeks to support a seamless environment for advanced data analysis and machine learning tasks and biological functional and network analysis.

As an aside, building the main site (in progress) was a fun opportunity to experiment with Jekyll, Ruby and embedding slick interactive canvas elements into websites. You can checkout all the code here https://github.com/dgrapov/CDS_jekyll_site.

slides: https://www.slideshare.net/dgrapov/machine-learning-powered-metabolomic-network-analysis

June 11, 2017 | Categories: Uncategorized | Tags: clustering, data analysis, data visualization, genomics, machine learning, network, pathways, proteomics, R, r-bloggers, science, shiny, software, statistics | Leave a comment

Try’in to 3D network: Quest (shiny + plotly)

I have an unnatural obsession with 4-dimensional networks. It might have started with a dream, but VR might make it a reality one day. For now I will settle for 3D networks in Plotly.

Presentation: R users group (more)

More: networkly

April 9, 2016 | Categories: Uncategorized | Tags: network, networkly, plotly, R, r-bloggers, shiny | Leave a comment

Network Visualization with Plotly and Shiny

R users: networkly: network visualization in R using Plotly

In addition to their more common uses, networks can be used as powerful multivariate data visualizations and exploration tools. Networks not only provide mathematical representations of data but are also one of the few data visualization methods capable of easily displaying multivariate variable relationships. The process of network mapping involves using the network manifold to display a variety of other information e.g. statistical, machine learning or functional analysis results (see more mapped network examples).

netmaping

The combination of Plotly and Shiny is awesome for creating your very own network mapping tools. Networkly is an R package which can be used to create 2-D and 3-D interactive networks which are rendered with plotly and can be easily integrated into shiny apps or markdown documents. All you need to get started is an edge list and node attributes which can then be used to generate interactive 2-D and 3-D networks with customizable edge (color, width, hover, etc) and node (color, size, hover, label, etc) properties.

2-Dimensional Network (interactive version) 2dnetwork

3-Dimensional Network (interactive version)

3dnetwork

View all code used to generate the networks above.

February 28, 2016 | Categories: Uncategorized | Tags: data analysis, data visualization, network, network mapping, networkly, plotly, R, r-bloggers, shiny | Leave a comment

Multivariate Data Analysis and Visualization Through Network Mapping

Recently I had the pleasure of speaking about one of my favorite topics, Network Mapping. This is a continuation of a general theme I’ve previously discussed and involves the merger of statistical and multivariate data analysis results with a network.

Over the past year I’ve been working on two major tools, DeviumWeb and MetaMapR, which aid the process of biological data (metabolomic) network mapping.

DeviumWeb– is a shiny based GUI written in R which is useful for:

data manipulation, transformation and visualization

statistical analysis (hypothesis testing, FDR, power analysis, correlations, etc)

clustering (heiarchical, TODO: k-means, SOM, distribution)

principal components analysis (PCA)

orthogonal partial least squares multivariate modeling (O-/PLS/-DA)

MetaMapR– is also a shiny based GUI written in R which is useful for calculation and visualization of various networks including:

biochemical

structural similarity

mass spectral similarity

correlation

Both of theses projects are under development, and my ultimate goal is to design a one-stop-shop ecosystem for network mapping.

In addition to network mapping,the video above and presentation below also discuss normalization schemes for longitudinal data and genomic, proteomic and metabolomic functional analysis both on a pathway and global level.

As always happy network mapping!

June 27, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, correlation network, Cytoscape, data analysis, data visualization, DeviumWeb, ggplot2, metabolomics, MetaMapR, multivariate, network mapping, O-PLS, R, r-bloggers, shiny, statistical analysis | 6 Comments

Tutorials- Statistical and Multivariate Analysis for Metabolomics

I recently had the pleasure in participating in the 2014 WCMC Statistics for Metabolomics Short Course. The course was hosted by the NIH West Coast Metabolomics Center and focused on statistical and multivariate strategies for metabolomic data analysis. A variety of topics were covered using 8 hands on tutorials which focused on:

data quality overview

statistical and power analysis

clustering

principal components analysis (PCA)

partial least squares (O-/PLS/-DA)

metabolite enrichment analysis

biochemical and structural similarity network construction

network mapping

I am happy to have taught the course using all open source software, including: R, and Cytoscape. The data analysis and visualization were done using Shiny-based apps: DeviumWeb and MetaMapR. Check out some of the slides below or download all the class material and try it out for yourself.

2014 WCMC LC-MS Data Processing and Statistics for Metabolomics by Dmitry Grapov is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Special thanks to the developers of Shiny and Radiant by Vincent Nijs.

February 17, 2014 | Categories: Uncategorized | Tags: biochemical network, chemical similarity network, Cytoscape, data analysis, data visualization, Devium, ggplot2, hierarchical clustering, mass spectral similarity, metabolomics, MetaMapR, network, O-PLS, O-PLS-DA, PCA, R, r-bloggers, shiny, TeachingDemos, tutorial | 13 Comments

Interactive Heatmaps (and Dendrograms) – A Shiny App

Heatmaps are a great way to visualize data matrices. Heatmap color and organization can be used to encode information about the data and metadata to help learn about the data at hand. An example of this could be looking at the raw data or hierarchically clustering samples and variables based on their similarity or differences. There are a variety packages and functions in R for creating heatmaps, including heatmap.2. I find pheatmap particularly useful for the relative ease in annotating the top of the heat map using an arbitrary number of items (the legend needs to be controlled for best effect, not implemented).

Heatmaps are also fun to use to interact with data!

Here is an example of a Heatmap and Dendrogram Visualizer built using the Shiny framework (and link to the code).

To run locally use the following code.

install.packages("shiny")
library(shiny)
runGitHub("Devium", username = "dgrapov",ref = "master", subdir = "Shiny/Heatmap", port = 8100)

It was interesting to debug this app using the variety of data sets available in the R datasets package (limiting options to data.frames).

My goals were to make an interface to:

transform data and visualize using Z-scales, spearman, pearson and biweight correlations
rotate the data (transpose dimensions) to view row or column space separately

visualize data/relationships presented as heatmaps or dendrograms

use hierarchical clustering to organize data

add a top panel of annotation to display variables independent of the internal heatmap scales

use slider to visually select number(s) of sample or variable clusters (dendrogram cut height)

There are a few other options like changing heatmap color scales, adding borders or names that you can experiment with. I’ve preloaded many famous data sets found in the R data sets package a few of my favorites are iris and mtcars. There are other datsets some of which were useful for incorporating into the build to facilitate debugging and testing. The aspect of dimension switching was probably the most difficult to keep straight (never mind legends, these may be hardest of all). What are left are informative (I hope) errors, usually coming from stats and data dimension mismatches. Try taking a look at the data structure on the “Data” tab or switching UI options for: Data, Dimension or Transformation until issues resolve. A final note before mentioning a few points about working with Shiny, missing data is set to zero and factors are omitted when making the internal heatmap but allowed in top row annotations.

Building with R and Shiny

This was my third try at building web/R/applications using Shiny.

Here are some other examples:

Basic plotting with ggplot2

Principal Components Analysis ( I suggest loading a simple .csv with headers)

It has definitely gotten easier building UIs and deploying them to the web using the excellent Rstudio and Shiny tools. Unfortunately this leaves me more time to be confused by “server side” issues.

My over all thoughts (so far) :

I have a lot to learn and the possibilities are immense
when things work as expected it is a stupendous joy! (thank you to Shiny, R and everyone who helped!)
when tracking down unexpected behavior I found it helpful to print app state at different levels to the browser using some simple mechanism like for instance:

#partial example

server.R
####
#create reactive objects to to "listen" for changes in states or R objects of interests
ui.opts$data<-reactive({
 tmp.data<-get(input$data)
 ui.opts$data<-tmp.data # either or
 tmp.data # may not be necessary
 })

#prepare to print objects/info to browser
output$any.name <- renderPrint({
tmp<-list()
tmp$data<-ui.opts$data()
tmp$match.dim<-match.dim
tmp$class.factor<-class.factor
tmp$dimnames<-dimnames(tmp.data)
str(tmp)
})

ui.r
####
#show/print info
mainPanel(
 verbatimTextOutput("any.name")

)

Over all two thumbs up.

July 7, 2013 | Categories: Uncategorized | Tags: arbitrary number, column space, data relationships, dendrogram, Devium, heatmap, hierarchical clustering, pheatmap, R, r-bloggers, shiny | 3 Comments

Principal Components Analysis Shiny App

I’ve recently started experimenting with making Shiny apps, and today I wanted to make a basic app for calculating and visualizing principal components analysis (PCA). Here is the basic interface I came up with. Test drive the app for yourself or check out the the R code HERE.

library(shiny)
runGist("5846650")

Above is an example of the user interface which consists of data upload (from.csv for now), and options for conducting PCA using the pcaMethods package. The various outputs include visualization of the eigenvalues and cross-validated eigenvalues (q2), which are helpful for selecting the optimal number of model components.The PCA scores plot can be used to evaluate extreme (leverage) or moderate (DmodX) outliers. A Hotelling’s T-squared confidence intervals as an ellipse would also be a good addition for this.

The variable loadings can be used to evaluate the effects of data scaling and other pre-treatments.

The next step is to interface the calculation of PCA to a dynamic plot which can be used to map meta data to plotting characteristics.

June 23, 2013 | Categories: Uncategorized | Tags: ggplot2, PCA, R, r-bloggers, shiny | 2 Comments

Dynamic Data Visualizations in the Browser Using Shiny

After being busy the last two weeks teaching and attending academic conferences, I finally found some time to do what I love, program data visualizations using R. After being interested in Shiny for a while, I finally decided to pull the trigger and build my first Shiny app!

I wanted to make a proof of concept app which contained the following dynamics which are the basics of any UI design:

1) dynamic UI options

2) dynamically updated plot based on UI inputs

Here is what I came up with.

Check out the app for yourself or the R code HERE.

library(shiny)
runGist('5792778')

The app consists of a user interface (UI) for selecting the data, variable to plot , grouping factor for colors and four plotting options: boxplot (above), histogram, density plot and bar graph. As an added bonus the user can select to show or hide jittered points in the boxplot visualization.

Generally #2 above was well described and easy to implement, but it took a lot of trial and error to figure out how to implement #1. Basically to generate dynamic UI objects, the UI objects need to be called using the function shiny:::uiOutput() in the ui.R file and their arguments set in the server.R file using the function shiny:::renderUI(). After getting this to work everything else fell in place.

Having some experience with making UI’s in VBA (visual basic) and gWidgets; Shiny is a joy to work with once you understand some of its inner workings. One aspect I felt which made the learning experience frustrating was the lack of informative errors coming from Shiny functions. Even using all the R debugging tools having Shiny constantly tell me something was not correctly called from a reactive environment or the error was in the runApp() did not really help. My advice to anyone learning Shiny is to take a look at the tutorials, and particularly the section on Dynamic UI. Then pick a small example to reverse engineer. Don’t start off too complicated else you will have a hard time understanding which sections of code are not working as expected.

Finally here are some screen shots, and keep an eye out for more advanced shiny apps in the near future.