# Estimation

### Estimation

• Estimation uses the concept of probability to infer the value of a population parameter from a random sample, e.g. a president’s approval rating based on a poll.

### Basic Form of Argument

• A decent-sized random sample from a population exhibits a statistic S, e.g. mean, proportion, standard deviation.
• Parameter P, S’s population counterpart, is derived from S.
• Therefore, in all probability, the population exhibits P within a certain margin of error, other things being equal.

### Example

• Inference:
• In a hypothetical random poll of 1,000 Americans, 600 approved of the president’s job performance.
• Therefore, in all probability, about 60 percent of the general population likewise approve.
• Rationale:
• The rationale for the inference is that a 1,000-member random poll represents the country at large fairly accurately. If 60% of the general population approve of the president’s job performance, the probability, for each person polled, is 0.6 that they’re an “approver.”  A thousand 0.6 probabilities and a little math yield a 90% probability that a random poll of 1,000 Americans has between 575 and 625 approvers.
• The upshot is that random polls of decent size are representative of the population at large.

### Problem of Justification

• What justifies the inference from sample to population, from a poll of thousands to a population of millions?  This would seem to be the fallacy of hasty generalization, like inferring that all swans are white because that’s the only color of swan you’ve seen.

#### From Population to Sample

• Assume that 60% of Americans approve of the president’s job performance.
• The population can be represented by a box of millions of tickets, 60 percent inscribed with a 1, representing those who approve of the president’s job performance. The others are marked with a 0.
• Throughout their 2007 textbook, Statistics, Freedman, Pisani, and Purves represent chance processes by box models, from which tickets are randomly drawn.
• The mean and standard deviation of the box are:
• Mean = 0.6
• Whether the box has 10 tickets or 10 million tickets, the mean is the same:
• (1 + 1 + 1 + 1 + 1 + 1 + 0 + 0 + 0 + 0) / 10 = 0.6
• Standard deviation =  0.489898
• The Standard deviation is the “average” distance from the mean
• The average distance of the ones is 0.4
• The average distance of the zeros is 0.6
• So the total average is slightly less than 0.5, there being more ones.
• The calculation:
• Sqrt[(0.42 x 6 + 0.62 x 4) / 10] = 0.489898
• Randomly draw 1,000 tickets from the box.
• Using common sense and probability theory we can figure out a priori some things about the sample.
• Since the sample is random, each ticket in the box has an equal chance of being drawn.  So for each ticket in the sample, the probability it’s a one = 0.6
• We know intuitively know that the number of ones in the sample is likely to be in the vicinity of 600.  At least it’s far more likely the number is closer to 600 than it is to 1,000 or 0.
• The probability of the mean can be represented by a probability distribution called the sampling distribution.
• The mean of the sampling distribution is 0.6
• The standard deviation can be calculated from the Bernoulli or Binomial Distributions.  It’s referred to as the standard error of the mean.
• StandardDeviation[BernoulliDistribution[0.6]] / Sqrt[1000] = 0.0154919
• StandardDeviation[BinomialDistribution[1000, 0.6]] / 1000 = 0.0154919
• Finally, the sampling distribution is approximated by the normal distribution with mean 0.6 and standard deviation 0.0154919.
• Probabilities with margins of error can be calculated from the sampling distribution. The probability that the sample mean is:
• between 0.575 and 0.625 = 0.90
• between 0.57 and 0.63 = 0.95
• between 0.56 and 0.64 = 0.99
• View Calculations of Confidence Levels and Intervals

#### From Sample to Population

• We’ve established that if the mean of a population of ones and zeros is 0.6, the sampling distribution is approximated by the normal distribution with mean = 0.6 and standard deviation = 0.0154919. It follows that the probability that the sample mean is between 0.57 and 0.63 = 0.95.
• Thus, using the notation of conditional probability:
• P(sample mean is between 0.57 and 0.63 | population mean is 0.6) = 0.95
• That is, the probability the sample mean is between 0.57 and 0.63 given that the population mean is 0.6 = 0.95
• View Conditional Probability
• But this is backwards.
• The problem is to justify the inference from sample to population.  We’ve succeeded only in justifying the reverse, the inference from population to sample.
• We want to prove something like:
• P(population is between 0.57 and 0.63 | sample mean is 0.6) = 0.95
• That is, the probability the population mean is between 0.57 and 0.63 given that the sample mean is 0.6 = 0.95.

#### Two Schools of Thought

##### Frequentism
• For frequentists, statistical probability must be testable by repeated experiments. Take the statement:
• P(the mean of a 1000-member random sample is between 0.57 and 0.63 | population mean is 0.6) = 0.95
• The statement can be tested by taking repeated samples.  It’s confirmed if the means of 95 percent of samples taken are between 0.57 and 0.63.
• Consider the reverse statement:
• P(population mean is between 0.57 and 0.63 | mean of a 1000-member random sample is 0.6) = 0.95
• The statement can’t be tested by “repeated populations.”  The only way to test the statement would be to take 1000-member random samples from multiple populations and see if the means of 95 percent of the populations are between 0.57 and 0.63. But that’s impossible. There’s only one population.
• For frequentists, therefore, population parameters do not have probability distributions and talk of their probabilities is scientifically illegitimate.
• You hear frequentists talk about confidence intervals and levels, saying things like:
• 60 percent of Americans approve of the president’s job performance, with a 95% confidence interval of ±3 percentage points.
• But this only means:
• There’s a 95% probability that the mean of a random sample is 60±3 percent given that the population mean is 60 percent.
• As Pew Research puts it:
• “A margin of error of plus or minus 3 percentage points at the 95% confidence level means that if we fielded the same survey 100 times, we would expect the result to be within 3 percentage points of the true population value 95 of those times.”
• How then do frequentists justify the inference from sample to population?
• For frequentists, hypotheses about populations are supported the same way scientific theories are: by their confirmed predictions.
• For example, Einstein’s theory of gravity, General Relativity, is supported by its many predictions of phenomena such as gravitational waves, time dilation, and the bending of light by the sun.
• More apropos is Quantum Mechanics, whose predictions are probabilities.  A famous prediction, for instance, is that an electron passing through a Stern-Gerlach magnet is deflected up or down with a 50/50 probability.  The prediction is confirmed by shooting billions of electrons through the magnet, half of which are observed to deflect up.
• In the same way, the hypothesis that 60% of Americans approve of the president’s job performance is supported by its prediction that there’s a 95% probability that the mean of a random sample is 60±3 percent given that the population mean is 60 percent.  The prediction is supported by any such sample.
##### Bayesianism
• Frequentists are right in saying that scientific theories are supported by their predictions. But how?  What’s the logic of support?
• Bayesianism is the view that the probability of scientific theories and other hypotheses is based on Bayes Theorem.
• In Bayesian Statistics, a population parameter is a random variable, computable by Bayes Theorem.
• Thus, unlike frequentists, Bayesians countenance statements such as:
• The percentage of Americans who approve of the president’s job performance has probability distribution X.
• That percentage = the mean of X with a certain margin of error and probability.
• The standard deviation of the percentage = the standard deviation of X.
• In short:
• Bayesians use Bayes Theorem to calculate probabilities that frequentists reject as illegitimate because they can’t be tested by repeated experiments.

### Calculations of Confidence Levels and Intervals

• For a 1000-member random sample whose probability distribution is normal with mean 0.6 and standard deviation 0.0154919:
• At the 90% confidence level, the mean = 0.6 ± 0.025
• Margin of Error = Standard Error x 90% Factor
• Standard error of the mean = 0.015492
• 90% Factor = 1.645
• Margin of error = 0.015492 x 1.645 = 0.025
• Confirmation
• Probability[x ≥ (0.6 – 0.025) & x ≤ (0.6 + 0.025), NormalDistribution[0.6, 0.015492]] = 0.893415
• At the 95% confidence level, the mean = 0.6 ± 0.03
• Margin of Error = Standard Error x 95% Factor
• Standard error of the mean = 0.015492
• 95% Factor = 1.96
• Margin of error = 0.015492 x 1.96 = 0.03
• Confirmation
• Probability[x ≥ (0.6 – 0.03) & x ≤ (0.6 + 0.03), NormalDistribution[0.6, 0.015492]]] = 0.947191
• At the 99% confidence level, the mean = 0.6 ± 0.04
• Margin of Error = Standard Error x 99% Factor
• Standard error of the mean = 0.015492
• 99% Factor = 2.575
• Margin of error = 0.015492 x 2.575 = 0.04
• Confirmation
• Probability[x ≥ (0.6 – 0.04) & x ≤ (0.6 + 0.04), NormalDistribution[0.6, 0.015492]] = 0.990176

### Monte Carlo Simulation of 95% Confidence Level

• A Monte Carlo simulation confirms that the 95% confidence interval = 0.6 ± 0.03 for 1000-member samples.
• In the simulation one hundred 1000-member random samples are taken from the population, with the result that 96 sample means (in red) are in the 0.57-0.63 range.
• 0.559, 0.569, 0.569, 0.576, 0.576, 0.576, 0.578, 0.578, 0.578, 0.579,0.582, 0.583, 0.583, 0.583, 0.584, 0.585, 0.586, 0.586, 0.586, 0.586,0.586, 0.587, 0.587, 0.587, 0.589, 0.59, 0.591, 0.591, 0.592, 0.592,0.592, 0.592, 0.593, 0.593, 0.593, 0.594, 0.595, 0.595, 0.595, 0.595,0.596, 0.596, 0.596, 0.596, 0.597, 0.597, 0.598, 0.598, 0.599, 0.599,0.599, 0.6, 0.6, 0.6, 0.6, 0.6, 0.601, 0.602, 0.602, 0.604, 0.604,0.604, 0.604, 0.604, 0.605, 0.605, 0.605, 0.606, 0.607, 0.607, 0.608,0.608, 0.609, 0.609, 0.61, 0.61, 0.61, 0.61, 0.61, 0.61, 0.611,0.611, 0.611, 0.612, 0.612, 0.614, 0.614, 0.615, 0.615, 0.616, 0.616,0.616, 0.618, 0.624, 0.625, 0.625, 0.626, 0.626, 0.627, 0.642
• In the histogram of the 100 simulated sample means, there is:
• 1 sample mean in the 0.55-0.56 range
• 2 in the 0.56-0.57 range
• 96 in the 0.57-0.63 range
• 1 in the 0.64-0.65 range
• In the graph below:
• The horizontal lines represent intervals of the sample means, plus and minus the margin of error.
• The top line, for example, represents the sample means 0.559, extending from 0.529 to 0.589
• The vertical line is the population mean, 0.6
• 96 sample means “cover” the population mean.
• Four don’t:
• Line 1: 0.559±0.03
• Line 2: 0.569±0.03
• Line 3: 0.569±0.03
• Line 100: 0.642±0.03
• Arguments
• objective vs subjective
• inference of population parameter outside and inside statistics
• probability statements in statistics must be testable by repeated experiments, as they are in physics (Quantum Mechanics, for example).

### Bayesian Estimation

Based on A First Course in Bayesian Statistical Methods
Pages 31-34, 2009, Peter Hoff

• The Non-Comparative Form of Bayes Theorem:
• Bayesian Estimation uses a variant of this form, replacing H with random variable θ:
• where
• θ is a random variable for the proportion of the population, for instance, approving the president’s job performance
• P(θ) is its prior probability
• y is the data, a random sample of 600 ones and 400 zeros.
• P(y) is its prior probability
• y|θ is a random variable for the data y given θ, and P(y|θ) its likelihood
• θ|y is a random variable for θ given the data y, and P(θ|y) its posterior probability
• The result of the calculation below:
• The expectation of θ|y is 0.5998 and its standard deviation 0.01547.
• At the 95% credible level, the mean is 0.5998 ± 0.03

Calculation using Bayes Theorem

Shortcut Calculation using Beta Distribution

Derivation of Prior Probability P(y)