Central Limit Theorem

Contents
Central Limit Theorem
  • Assume a population with mean μ, standard deviation σ, and any probability distribution.  Consider an n-member random sample from the population.  The Central Limit Theorem says that, as n increases, the probability distribution of the sample mean approaches a normal distribution with mean μ and standard deviation σ/√n.
    • The normal distribution is the sampling distribution for the mean.
    • The standard deviation σ/√n is the standard error for the mean.
  • Example
    • Consider a population with
      • μ = 69
      • σ = 11.75
    • The probability distribution of the mean of a 100-member sample is approximated by a normal distribution with mean = 69 and standard deviation = 11.75 / √100 = 1.175
    • Probabilities can be calculated from the distribution, e.g:
      • the probability is 0.95 that the mean of a 100-member random sample is between 66.7 and 71.3.
Monte Carlo Simulation
  • In this Monte Carlo simulation:
    • The population is arbitrarily assumed to have a Gamma Distribution with mean 4.0 and standard deviation 2.82843.
    • Random samples are taken from the population with sizes 100, 1000, 10000, and 100000.
    • The calculated standard error for the mean (= population SD / √sample size) compares with the standard deviation of the means of 500 random samples.
      • For sample size 1000, for example:
        • standard error = 0.0894427
        • standard deviation of 500 random samples = 0.0902963
    • As the sample size increases, the standard error decreases and the graphs of the sampling distribution become more narrow.
Interactives

View Predicted Sampling Distribution

View Sampling Distribution from Simulation

View Combined Interactive

The Theorem

Let X1,…, Xn be independent random variables having a common distribution with expectation μ and standard deviation σ. Then for any real numbers a and b, as n → ∞:

Image Credit britannica.com/science/probability-theory/The-central-limit-theorem

  • Take an n-member random sample from a population with an arbitrary probability distribution, e.g. the ChiSquare Distribution with mean 69. 
  • A 20-member random sample, for example, might look like this:
    • 31.4131, 62.8372, 63.247, 73.7571, 44.7476, 74.4283, 102.562, 67.8194, 72.8808, 61.512, 81.3126, 80.7713, 45.6086, 75.0849, 65.4396, 50.1927, 53.2385, 72.3477, 55.7154, 56.2447
  • Define the random variable q(n):
  • n is the sample size and q(n) is its standardized mean
  • CLT says that as n increases q(n) approaches a normal distribution, specifically the standard normal distribution (where μ=0 and σ=1).
  • To prove that q[n] approaches a normal distribution we use the fact that the probability of a randomly selected number being between numbers a and b under a continuous distribution is defined as the area under its curve between a and b.
  • Two examples where P(a < q[n] ≤ b) = the area under the standard standard normal distribution between a and b.
  • Example 1: x between 0 and 1.
    • According to CLT, as n increases
      • P(0 < q[n] ≤ 1) approaches the area under the standard normal curve between 0 and 1.
  • As n increases, P(0 < q[n] ≤ 1) approaches 0.341345
    • In a Monte Carlo simulation of 1000 iterations, the proportion of times q[100000] was between 0 and 1 was 0.346
      • Monte Carlo Simulation Program
        • tally = 0;
        • n = 1000;
        • SeedRandom[RandomInteger[{1, 1000}]];
        • Do[tmp = q[1000000];
          • If[tmp ≥ 0 && tmp ≤ 1, tally = tally + 1, tally], {n}];
        • Print[“Proportion of times q[100000] is between 0 and 1 = “, N[tally / n]];
  • The area under the curve between 0 and 1 = 0.341345
    • Integrate[(1/Sqrt[2 Pi]) E^(-x^2 / 2), {x, 0, 1}] = 0.341345
  • Second Example: x ≥ 1
    • As n increases, P(q[n] > 1) approaches 0.15445
      • In a Monte Carlo simulation of 1000 iterations, the proportion of times q[100000] ≥ 1 = 0.1544
    • The area under the curve between 1 and ∞  = 0.158655
      • Integrate[(1/Sqrt[2 Pi]) E^(-x^2 / 2), {x, 1, ∞}] = 0.158655
  • In general, for any two numbers a and b, as n increases, the probability that q[n] is between a and b approaches the area under the standard normal distribution between a and b. Thus, by definition, q[n] approaches the standard normal distribution.