Statistics & Probability for Data Science: The Essentials
Simplified notes from my AI Engineering course, broken down for developers and engineers.
I have started a course from Udemy! Yes you heard it right. I bought this course way back in 2020 when ML was still a buzz word. I didn’t directly use the things from ML/AI world in my field of engineering directly. Although cellular communications does have a lot of smart signal processing and sequence generation I didn’t require them to do my daily job. As you might have guessed it, I have not completed it till today. Finally with world pivoting to AI engineering and demanding productivity gains, I believe it’s time to get serious and learn some.

About the course
I liked the course from Frank Kane since it:
Seems comprehensive: Data Science - Machine Learning - Deep Learning.
Application oriented instead of detailed mathematics (easy to get started)
Seems updated: GenAI and AI tools section is added.
Here’s the link to the course: https://www.udemy.com/course/data-science-and-machine-learning-with-python-hands-on/
The notes below will be useful if you pair it with the course or any similar course and some hands on practice in notebooks. You can spin up a google colab book too. I have my notebook shared here:
Google Colab Notebook With Examples
Statistics and Probability
Mean:
Also known as average. It’s the total divided by number of samples
Median:
The midpoint of the sorted sample values. It is the 50th percentile of the data set.
If there are even number of sample the midpoint will be the average of the two values around the center.
Mode:
Most common value in the samples. The most frequent value in the dataset.
Variance:
\(\sigma^2\)Average of the squared differences from the mean.
Why squared? Because differences from the mean may contain negative values which you don’t want the average to cancel out.
Standard Deviation:
\(\sqrt{\sigma^2} = \sigma\)It’s the square root of variance.
Tells us how spread is the data set and helps us identify the outliers.
Note on variance & standard deviation:
If you’re working with samples (meaning representative of larger dataset) instead of the complete dataset, the denominator for the squared mean is N-1 instead of N.
Probability Density Function:
Probability of a data point falling within some given range of a given value.
Applies to continuous data
Probability Mass Functions:
Probability of a discrete value occurring in dataset.
Applies to discrete data.
Percentiles:
A point in the dataset where x percent of the values are less than that value.
E.g: 80th percentile point means at that point 80 percent of the values are less than that value in the dataset.
Moments:
A quantitative measure of shape of probability density function.
Mathematical way to talk about the shape of the function.
First moment — mean.
Second moment — variance.
Third moment — skew, how lopsided the distribution function is.
Fourth moment — kurtosis, how sharp is the peak and thick is the tail of the curve.
Covariance:
Intuitively: Measures how two variables change relative to each other. It measures how variables vary together. A positive covariance is that they move together in same directions, zero means no covariance and negative means they move in opposite directions.
Mathematically: It is the dot product between the vectors of the deviation from the mean of two variables. Divided by the sample size.
Correlation:
Intuitively: It’s hard to know what exactly is the scale of covariance just by looking at the values generated by COV(X,Y) hence if we normalize them by dividing it by standard deviation of X and standard deviation of Y the scale gets normalized, i.e between 0-1.
0 — no correlation,
1 — positive correlation,
-1 — negative correlation.
Mathematically: Covariance divided by standard deviation of X divided by standard deviation of Y.
Conditional Probability:
If A and B are two dependent events then, what’s the probability of event B occurring given the probability of event A.
Mathematically: P(B|A) = P(A,B) / P(A)
P(B|A) = probability of B given A
P(A,B) = probability of both events happening
P(A) = probability of A
This becomes useful when looking up for relation in two events in a given data set. In statistics P(A, B) = P(A) * P(B) for independent events. So you can see we can use both equations to check if there’s a dependence in the events.
Bayes’ Theorem:
Intuitively: It shows that your final belief (A|B) is a combination of your initial belief (B) and the strength of the new evidence (A). Probability of something that depends on event B also depends on event A. Here are better resources to wrap around this concept:
Youtube video visualizing the theorem:
Mathematically: P(A|B) = P(A) * P(B|A) / P(B)
Conclusion
I wrote this guide to be the reference I want in the future when I’m working on these concepts. Bookmark this page for your next assignment, but don’t stop at the theory. The best way to learn is by doing, so be sure to clone the Colab notebook and experiment with the Python implementations yourself.
If you found value in this post, please consider subscribing—it’s the best way to make sure you never miss an update.
You can also support the newsletter directly by checking out my referral links on Linktree. 💎🙌

