An Empirical Cumulative Distribution Function (ECDF) plot is a graphical technique used to visualize the distribution of a dataset. In MATLAB, we can create ECDF plots using the in-built cdfplot() function.

In this comprehensive guide, we will learn:

  • What is an Empirical CDF Plot?
  • Why Use an Empirical CDF Plot?
  • Statistical Properties Revealed by ECDF Plot
  • How to Plot an ECDF in MATLAB
    • Basic cdfplot() Syntax
    • Customizing the ECDF Plot
    • Comparing Empirical and Theoretical CDFs
    • Advanced Customization Techniques
  • ECDF vs Other Plots: Strengths and Limitations
  • Case Study: ECDF for Frequency Response Analysis
  • Best Practices for ECDF Plotting and Interpretation
  • Applications of ECDF Plots
  • Pros and Cons of Using ECDF Plots

So let‘s get started!

What is an Empirical CDF Plot?

An Empirical Cumulative Distribution Function (ECDF) plot displays the proportion of observations in a dataset less than or equal to each data point.

Essentially, it plots the data points from lowest to highest value on the x-axis versus its percentile value on the y-axis.

For example, if 60% of the data lies below a certain value, the ECDF plot will depict a y-value of 0.6 at that x-location.

As all the observations get included moving along the distribution, the ECDF always terminates at (max(x), 1).

Why Use an Empirical CDF Plot?

Some key applications of ECDF plots include:

  • Visualize distribution: ECDFs allow us to visually inspect the distribution of data and features like modality, skewness, tails, gaps, etc.
  • Compare distributions: We can plot ECDFs for multiple datasets on the same axes to compare their distributions.
  • Goodness of fit testing: We can evaluate how well a dataset fits a theoretical distribution by comparing ECDF vs CDF plots.
  • Determine percentiles: ECDFs allow easy estimation of percentiles and probability levels from the plot itself.
  • Identify patterns: Changes in shape and slope reveal key distribution aspects like clusters, triangles, gaps.
  • Robust to outliers: Being non-parametric, ECDFs rely on order statistics and thus are not influenced by outliers.

Statistical Properties Revealed by ECDF Plot

Some key statistics and distribution properties that can be observed from the shape and features of an ECDF plot include:

  • Modality: Number of distribution peaks
  • Symmetry: Whether left and right tails match
  • Outliers: Points exceeding distribution tails
  • Gaps: Regions with no/few observations
  • Skewness: Longer tail to left or right side
  • Percentiles: Data values at probability levels

For example, here are some observations we can make from this sample ECDF plot:

  • Unimodal distribution (one peak)
  • Symmetric tails on both sides
  • No outliers or gaps
  • Slight negative skew evident
  • Median around 0.7, 90th percentile approx. 1.0

Next, let‘s discuss how to generate ECDF plots in MATLAB.

How to Plot an ECDF in MATLAB

MATLAB provides the in-built cdfplot() function to visualize ECDFs.

Basic cdfplot() Syntax

The basic syntax for cdfplot() is:

cdfplot(x)

Where x is the dataset vector.

cdfplot() plots the ECDF as a step function with jumps at each observed data point. Let‘s see an example:

x = randn(100,1); % 100 random observations
cdfplot(x);

We can also obtain the ECDF plot line object handle using:

h = cdfplot(x)

This allows us to subsequently customize the plot by changing h properties.

Customizing the ECDF Plot

We can configure the ECDF plot by passing name-value pairs to cdfplot() and modifying the line object handle:

h = cdfplot(x,‘LineStyle‘,‘-.‘,‘Color‘,‘r‘,‘LineWidth‘,2);  

h.Marker = ‘+‘;
h.MarkerSize = 8;

The key aspects we can customize include:

  • Line style, width and color
  • Marker type, size and color
  • Axis limits, grid, title etc.
  • Add multiple ECDF lines on same axes

Refer to the cdfplot documentation for more details.

Advanced Customization Techniques

Some useful tips for enhanced ECDF plot customization include:

  • Multi-panel plots: Visualize multiple ECDFs on separate subplots for easier comparison using subplots() method.
  • Zoom in on tails: Use tight axis limits to inspect distribution extremities. This can reveal insights obscured in the full view.
  • Highlight regions of interest: Annotate specific plot sections using shapes, callouts and texts to draw attention.
  • Emphasize individual lines: Vary color, style, and thickness across ECDFs for improved readability and focus.
  • Interactive plotting: Create manipulable plots with data cursor hover texts and adjustable axes using GUI tools like appdesigner.

These tips help create more customized, publication-ready ECDF plots.

Comparing Empirical and Theoretical CDFs

An excellent application of ECDF plots is comparing them against theoretical Cumulative Distribution Functions (CDFs). This allows us to visually assess goodness of fit.

The steps are:

  1. Plot the ECDF of sample data x using cdfplot(x)
  2. Define an axis vector ‘xx‘ from min(x) to max(x)
  3. Compute the CDF values y = cdf(‘Distribution‘,xx)
  4. Plot the theoretical CDF line using plot(xx,y)
  5. Compare ECDF vs CDF!

For example, let‘s fit a normal distribution:

x = randn(100,1);

cdfplot(x);
hold on 

xx = min(x):0.1:max(x);
y = cdf(‘normal‘,xx,0,1);

plot(xx,y,‘m‘,‘LineWidth‘,2)   

legend(‘Empirical CDF‘,‘Theoretical CDF‘)

The close alignment of empirical and theoretical curves suggests the data is normally distributed with mean 0 and std 1.

We can even assess the goodness-of-fit quantitatively using statistical tests like the Kolmogorov-Smirnov test.

ECDF vs Other Plots: Strengths and Limitations

How do ECDFs compare to other common distribtion plots like histograms, kernel density estimates and Q-Q plots?

Histograms group data into bins and depict bin counts. This representation depends heavily on the binning scheme. Also, histograms are unable to capture modality and clusters clearly.

In contrast, ECDFs provide a smooth, continuous plot robust to parameter choices. However, density estimation can be difficult with ECDFs.

Kernel density estimates (KDE) produce a smooth density plot akin to a smoothed histogram. But KDEs have their own bandwidth selection issues and can underestimate tails.

Q-Q plots facilitate easy goodness-of-fit assessment by comparing sample quantiles against theoretical distribution quantiles. However, full distribution visualization is poor with Q-Q plots.

Thus, ECDF plots strike an excellent balance between visualization smoothness, distribution shape depiction and parameter robustness. Their non-parametric nature makes them versatile for understanding and comparing univariate data distributions.

Case Study: ECDF for Frequency Response Analysis

Let‘s walk through an application of using ECDF analysis for frequency response characterization.

Problem Definition

We need to analyze the frequency response of a feedback control system across 20 test runs. The frequency response metric of interest is peak overshoot percentage.

Our objectives are to:

  1. Visualize peak overshoot distribution
  2. Quantify variability across test runs
  3. Check if response matches simulator model

Data Preprocessing

The data consists of peak overshoot values for 20 system test runs, with some sample runs as follows:

We preprocess this into a vector form for cdfplot():

x = [12, 15, 10, 16, 11, 14, 15, 13, ...]; % Sample data

ECDF Plotting and Analysis

We can visualize the peak overshoot ECDF using:

cdfplot(x)

Analyzing this:

  • Distribution appears unimodal and symmetric without outliers
  • Median peak is around 13%, majority values between 10-16%
  • Tighter grouping than simulated profile (red line)

This reveals significantly lower test variability against simulations. Recommended actions are tuning simulator models or revising environmental test conditions for alignment.

Through this simple case study, we saw how ECDF analysis provides valuable statistical insights!

Best Practices for ECDF Plotting and Interpretation

Some key guidelines for effective use of ECDFs include:

  • Check ECDF overlaps and deviation patterns instead of just mean values.
  • Pay attention to slopes and curvature in addition to absolute plot locations.
  • Be wary of distortions at distribution tails due to low samples.
  • Don‘t overinterpret minor ECDF fluctuations as true effects.
  • Use both ECDFs and other statistical graphs for comprehensive insights.
  • Employ quantitative metrics like K-S test along with visual assessment.

With experience, we can become proficient in distilling key insights from ECDF graphs.

Applications of ECDF Plots

Some examples where ECDF plots are extensively used include:

  • Statistical data analysis: Comparing distribution of metrics across samples, treatments, time points.
  • Finance: Visualize returns distribution, analyze portfolio risk, simulate stock price movements.
  • Engineering: Diagnose machine deterioration via changes in vibration data distribution over time.
  • Environmental science: Study climate patterns through shifting distribution profiles of precipitation, pollution etc.
  • Biotechnology: Compare toxicity levels through drug dose-response curve distributions.

Their simple yet flexible nature makes ECDF plots a versatile tool for diverse domains.

Pros and Cons of Using ECDF Plots

Let‘s summarize the key advantages and limitations:

Pros:

  • Visualize full distribution outlook
  • Handle any data type
  • Easily combined with theoretical CDFs
  • Allow multiple dataset comparison
  • Robust against outliers

Cons:

  • Obscure features at distribution tails
  • Hard to identify exact distribution forms
  • Complex shapes for multimodal data

In short, ECDF plots provide an intuitive yet powerful statistical tool for initial data analysis and comparison in MATLAB. With practice, we can gain proficiency in leveraging ECDFs to uncover vital insights into distribution features.

I hope you found this comprehensive MATLAB cdfplot() guide useful! Happy plotting!

Similar Posts