autoscaling | Creative Data Solutions

Posts tagged “autoscaling”

Comparison of Serum vs Urine metabolites +

Primary metabolites in human serum or urine.

Oh oh, there seem to be some outliers: serum samples looking like urine and vice versa. Fix these and evaluate using PCA and hierarchical clustering on rank correlations.

Now things look more believable. Next let us test the effects of data pre-treatment on PLS-DA model scores for a 3 group comparison in serum. Ideally group scores would be maximally resolved in the dimension of the first latent variable (x) and inter-group variance would be orthogonal or in the y-axis.

Compared to raw data (TOP) where ~ 3 top variables (glucose, urea and mannitol) dominate the variance structure, the autoscaled model, due to variable-wise mean subtraction and division by the standard deviation, displays a more balanced contribution to scores variance by variables. The larger separation between WHITE and RED class scores along the x-axis suggest improved classifier performance over raw data model and overview of samples with scores outside their respective group’s Hotelling’s T ellipse (95%) might point to a sample outlier to further investigate or potentially exclude from the current test.

December 16, 2012 | Categories: Uncategorized | Tags: autoscaling, clustering, imDEV, metabolomics, normalizations, outliers, PCA, PLS-DA | Leave a comment

Follow Creative Data Solutions on WordPress.com

Top Posts & Pages

Data visualization Gallery

LOESS_span

Horizontal scatter plots of the log transformed concentrations for each model variable are shown. The horizontal arrangement of metabolite scatter plots is scaled to their loading in the discriminant model. A given species importance in the classification increases with increasing displacement from the origin (broken line). The direction of the displacement, left or right, designates whether the species was decreased (left) or increased (right) in the diabetic relative to the non-diabetic patients. The overall model discrimination performance is presented as a scatter plot of subject model scores (inset).

Horizontal scatter plots of the log transformed concentrations for each model variable are shown. The horizontal arrangement of metabolite scatter plots is scaled to their loading in the discriminant model. A given species importance in the classification increases with increasing displacement from the origin (broken line). The direction of the displacement, left or right, designates whether the species was decreased (left) or increased (right) in the diabetic relative to the non-diabetic patients. The overall model discrimination performance is presented as a scatter plot of subject model scores (inset).

Figure 1. The type 2 diabetes-associated lipidomic changes projected in context of their biological relationships in obese African-American women.

Figure 1. The type 2 diabetes-associated lipidomic changes projected in context of their biological relationships in obese African-American women.

Treatment response network

journal.pone.0048852.g001

Tissue network cancer

cough syrup network

WCMC network

network_1

imDEV clouds

Spearman’s correlations were used to generate multi-dimensionally scaled parameter connectivity networks for variable intercorrelations. Networks were oriented with fasting glucose at the origin and SFA in the lower right quadrant. Colored ellipses represent the 95% probability locations of metabolite classes (Hoettlings T2, p<0.05). Nodes indicate clinical parameters (diamonds), <20-carbon fatty acid metabolites (circles) and ≥20-carbon fatty acid metabolites (triangles), with discriminant model variables and glucose enlarged. Significant correlations between species are designated by orange (positive) or blue (negative) connecting lines (p<0.05, non-diabetic; p<0.01, diabetic participants).

Spearman’s correlations were used to generate multi-dimensionally scaled parameter connectivity networks for variable intercorrelations. Networks were oriented with fasting glucose at the origin and SFA in the lower right quadrant. Colored ellipses represent the 95% probability locations of metabolite classes (Hoettlings T2, p<0.05). Nodes indicate clinical parameters (diamonds), <20-carbon fatty acid metabolites (circles) and ≥20-carbon fatty acid metabolites (triangles), with discriminant model variables and glucose enlarged. Significant correlations between species are designated by orange (positive) or blue (negative) connecting lines (p<0.05, non-diabetic; p<0.01, diabetic participants).

g9135

OPLS-DA network

mc 2

network_1

netmaping

Bionetwork1

Scatterplot matrix for overview of correlations and regressions, displaying box plots for Iris data species, variable histograms, correlation statistics, stripcharts and best fit lines.

Scatterplot matrix for overview of correlations and regressions, displaying box plots for Iris data species, variable histograms, correlation statistics, stripcharts and best fit lines.

network

Cancer tissue network

Metabolites are represented by circular “nodes” linked by “edges” with arrows designating the direction of the biosynthetic gradient (i.e. substrate to product). Some metabolites are linked by more than one enzymatic step. Node sizes represent magnitudes of differences in plasma metabolite geometric means (ΔGM). Arrow widths represent magnitudes of changes in product over substrate ratios (ΔP:S). Colors of node borders and arrows represent the significance and direction of changes relative to non-diabetics as per the figure legend. Differences are significant at p<0.05 by Mann-Whitney U test adjusted for FDR (q = 0.1).

Metabolites are represented by circular “nodes” linked by “edges” with arrows designating the direction of the biosynthetic gradient (i.e. substrate to product). Some metabolites are linked by more than one enzymatic step. Node sizes represent magnitudes of differences in plasma metabolite geometric means (ΔGM). Arrow widths represent magnitudes of changes in product over substrate ratios (ΔP:S). Colors of node borders and arrows represent the significance and direction of changes relative to non-diabetics as per the figure legend. Differences are significant at p<0.05 by Mann-Whitney U test adjusted for FDR (q = 0.1).

C and E figure

PLS_DA repeated measures trajectory

loess_norm50

ASMS 2014 j fahrman

PCA normalizations

PLS-DA NETWORK

genotype network

known partial correlation network2

journal.pone.0048852.g002

network

333

Treatment effects network

composite2

Topics

Suggested Blogs

Creative Data Solutions

When you want to get to know and love your data

www.r-bloggers.com

Design a site like this with WordPress.com