Estimating Mean using t Distribution

One can estimate the mean using the t distribution. In this post, we will discuss estimating mean using t distribution. The process of constructing a confidence interval using a t distribution is almost identical to that used to construct a confidence interval using the standard normal distribution.

First, we must know that variable $x$ is normally distributed with unknown standard deviation $\sigma$ and that we will draw a small sample ($n<30$). We then choose $c$, the desired level of confidence, and calculate the statistics $\overline{x}$ and $s$ from our sample group.

Margin of Error

The sample mean $\overline{x}$ will again be the best point estimate and the center of our interval. One can then calculate the margin of error for our estimate using the formula:

$$E=t_c \frac{s}{\sqrt{n}}$$

where $t_c$ is the critical t-value corresponding to the level of confidence $c$. The values of $t_c$ for common values of $c$ are given in the t-table. Make sure to use a degree of freedom of $n-1$.

Note that $t_c > z_c$ for the same value of $c$ since the t-distribution is wider, so we get a larger margin of error using the t-distribution.

Estimating mean using t distribution

Example: Estimating mean using t distribution

Suppose we had our sample of 5 women’s heights: 67, 63, 64, 65, 63. If it is known that women’s heights were normally distributed, but one does not know that $\sigma = 2.75$, then one can use the sample standard deviation $s$ as our estimate of $\sigma$, and then use a t-distribution interval.

The sample mean is $\overline{x} = 64.4$ inches and the sample standard deviation is $s=1.67$ inches. For a 95% confidence interval, the critical t-score for degrees of freedom is: $t_c=2.776$. So

\begin{align*}
E &= t_c \frac{s}{\sqrt{n}} \\
&=(2.776) \left(\frac{1.67}{\sqrt{5}}\right) \\
&=\approx 2.07
\end{align*}

So, our 95% confidence interval is

$$[64.4 – 2.07, 64.4 + 2.07] = [62.33, 66.47]$$

Exercise: Estimate Mean using t distribution

SAT Math scores are normally distributed. A sample of scores for 20 students has a sample mean of $\overline{x} = 522.8$ with a sample standard deviation of $s=154.5$.

  • Calculate the 90% confidence interval for the mean SAT Math Score.
  • Suppose the same sample mean and sample standard deviation had been obtained from a sample of size 16. What would the 90% confidence interval be?
  • Suppose the same sample mean and sample standard deviation were obtained from a sample of size 50. What would the 90% confidence interval be?

Assumptions using the t distribution

For this estimation to be valid, your data should meet one of the following conditions:

  1. The data is approximately normally distributed. This is the ideal scenario, especially for small sample sizes ($n < 30$).
  2. The sample size is large ($n \ge 30$). Thanks to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal, even if the original population data is not. This makes the t-distribution robust for larger samples.

t-Distribution vs. Z-Distribution (Normal)

This is a common point of confusion. Here’s a simple decision guide:

Featuret-DistributionZ-Distribution (Normal)
Population $\sigma$UnknownKnown
Test Statistic$t=\dfrac{\overline{x} – \mu}{\frac{s}{\sqrt{n}}}$$t=\dfrac{\overline{x} – \mu}{\frac{\sigma}{\sqrt{n}}}$
VariabilityMore variable (thicker tails)
Shape Depends OnDegrees of Freedom (df)It is always the standard normal curve
When to UseMost real world situationsMost real-world situations

In practice, you will almost always use the t-distribution for estimating a population mean.

Finding the Critical t-Value

You can find critical t-values in several ways:

  1. t-Table (Statistical Table): The traditional method. You find the value at the intersection of your df row and your $\frac{\alpha}{2}$ column.
  2. Statistical Software: Programs like R, Python (with SciPy), SPSS, etc., can calculate it precisely.
  3. Calculators: Many advanced calculators (like the TI-84) have inverse t-functions.

For example, in Python, you would use scipy.stats.t.ppf(0.975, df=24) to get 2.064. (We use 0.975 because we need the cumulative probability up to the critical value, which is $1-\frac{\alpha}{2}$).

By following this process, you can reliably estimate a population mean even when you only have sample data, properly accounting for the uncertainty that comes with estimating the population standard deviation.

Data Analysis in R Language

Leave a Comment

Discover more from Statistics for Data Science & Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading