# Central Limit Theorem

##### Central Limit Theorem
• Assume a population with mean μ, standard deviation σ, and any probability distribution.  Consider an n-member random sample from the population.  The Central Limit Theorem says that, as n increases, the probability distribution of the sample mean approaches a normal distribution with mean μ and standard deviation σ/√n.
• The normal distribution is the sampling distribution for the mean.
• The standard deviation σ/√n is the standard error for the mean.
• Example
• Consider a population with
• μ = 69
• σ = 11.75
• The probability distribution of the mean of a 100-member sample is approximated by a normal distribution with mean = 69 and standard deviation = 11.75 / √100 = 1.175
• Probabilities can be calculated from the distribution, e.g:
• the probability is 0.95 that the mean of a 100-member random sample is between 66.7 and 71.3.
##### Monte Carlo Simulation
• In this Monte Carlo simulation:
• The population is arbitrarily assumed to have a Gamma Distribution with mean 4.0 and standard deviation 2.82843.
• Random samples are taken from the population with sizes 100, 1000, 10000, and 100000.
• The calculated standard error for the mean (= population SD / √sample size) compares with the standard deviation of the means of 500 random samples.
• For sample size 1000, for example:
• standard error = 0.0894427
• standard deviation of 500 random samples = 0.0902963
• As the sample size increases, the standard error decreases and the graphs of the sampling distribution become more narrow.
##### The Theorem

Let X1,…, Xn be independent random variables having a common distribution with expectation μ and standard deviation σ. Then for any real numbers a and b, as n → ∞:

• Take an n-member random sample from a population with an arbitrary probability distribution, e.g. the ChiSquare Distribution with mean 69.
• A 20-member random sample, for example, might look like this:
• 31.4131, 62.8372, 63.247, 73.7571, 44.7476, 74.4283, 102.562, 67.8194, 72.8808, 61.512, 81.3126, 80.7713, 45.6086, 75.0849, 65.4396, 50.1927, 53.2385, 72.3477, 55.7154, 56.2447
• Define the random variable q(n):
• n is the sample size and q(n) is its standardized mean
• CLT says that as n increases q(n) approaches a normal distribution, specifically the standard normal distribution (where μ=0 and σ=1).
• To prove that q[n] approaches a normal distribution we use the fact that the probability of a randomly selected number being between numbers a and b under a continuous distribution is defined as the area under its curve between a and b.
• Two examples where P(a < q[n] ≤ b) = the area under the standard standard normal distribution between a and b.
• Example 1: x between 0 and 1.
• According to CLT, as n increases
• P(0 < q[n] ≤ 1) approaches the area under the standard normal curve between 0 and 1.
• As n increases, P(0 < q[n] ≤ 1) approaches 0.341345
• In a Monte Carlo simulation of 1000 iterations, the proportion of times q[100000] was between 0 and 1 was 0.346
• Monte Carlo Simulation Program
• tally = 0;
• n = 1000;
• SeedRandom[RandomInteger[{1, 1000}]];
• Do[tmp = q[1000000];
• If[tmp ≥ 0 && tmp ≤ 1, tally = tally + 1, tally], {n}];
• Print[“Proportion of times q[100000] is between 0 and 1 = “, N[tally / n]];
• The area under the curve between 0 and 1 = 0.341345
• Integrate[(1/Sqrt[2 Pi]) E^(-x^2 / 2), {x, 0, 1}] = 0.341345
• Second Example: x ≥ 1
• As n increases, P(q[n] > 1) approaches 0.15445
• In a Monte Carlo simulation of 1000 iterations, the proportion of times q[100000] ≥ 1 = 0.1544
• The area under the curve between 1 and ∞  = 0.158655
• Integrate[(1/Sqrt[2 Pi]) E^(-x^2 / 2), {x, 1, ∞}] = 0.158655
• In general, for any two numbers a and b, as n increases, the probability that q[n] is between a and b approaches the area under the standard normal distribution between a and b. Thus, by definition, q[n] approaches the standard normal distribution.