Statistics is the use of mathematics to shed light on collected data and to make inferences from the data to the general population
Contents
Descriptive Statistics
- Descriptive Statistics is the use of mathematics and visualization to throw light on collected data.
- View Descriptive Statistics
Inferential Statistics
- Inferential Statistics is the use of mathematics, probability, and randomness to make inferences from collected data to the general population.
- A statistical inference is an inference from a representative sample of a population to the population as a whole. Inferences that, for example:
- 39% of Americans think homsexuality is morally wrong
- GLP-1 agonists are effective in reducing weight
- Smoking, high blood pressure, and obesity are risk factors for heart disease.
- How do you obtain a representative sample? The American philosopher Charles Sanders Peirce gave the answer in 1883. Referring to inductive reasoning, he wrote:
- “The rule requires that the sample should be drawn at random and independently from the whole lot sampled. That is to say, the sample must be taken according to a precept or method which, being applied over and over again indefinitely, would in the long run result in the drawing of any one set of instances as often as any other set of the same number.” (“A Theory of Probable Inference”)
- So if 75% of Americans are religious, the probability that a randomly selected American is religious is 0.75. From which it can be calculated that there’s a 95% probability that, in a random poll of 1,000 Americans, between 723 and 777 are religious.
- A random sample is thus a representative sample. And from a representative sample you can draw inferences about the population in general.
- That’s the essence of statistical inference.
Kinds of Statistical Inference
- Estimation
- Estimation is the process of inferring the value of a population parameter from the sample statistic of a random sample, e.g. a president’s approval rating based on a poll.
- View Estimation.
- Hypothesis Testing
- A hypothesis test is a test used to assess the epistemic status of a statistical hypothesis by comparing the results of the test to results produced by chance.
- A statistical hypothesis is a hypothesis about a population that can be tested by examining a representative sample of the population.
- A representative sample is typically generated by a variant of simple random sampling such as stratified random sampling.
- Population parameters that are typically tested include: mean, difference in means, proportion, difference in proportions, and correlations.
- The epistemic status of a hypothesis is how much it’s supported by the evidence.
- A hypothesis thus might be beyond a reasonable doubt, plausible, doubtful or should be rejected.
- An epistemic status can also be numeric, e.g. a Bayesian probability.
- View Epistemic Probability
- Common hypothesis tests are the t-test and the z-test.
- View Hypothesis Testing.
- Regression Analysis
- Regression Analysis is a method for finding an equation and a set of (independent) variables that best predict a dependent variable, e.g. an equation predicting how fast a galaxy is moving away from Earth based on its distance from Earth.
- View Regression Analysis.
- Forecasting
- A forecast is derived from the data using tools such as:
- Time Series Analysis
- Regression Analysis
- Polling (for an upcoming election)
- Predictive causal models.
- View Forecasting.
- A forecast is derived from the data using tools such as:
- Decision Theory
- Decision Theory is the use of mathematical probability for deciding what to do based on the projected consequences of the options.
- View Decision Theory.
Mathematical Probability underlying Statistical Inference
- Probability Theory
- Random Variables and Probability Distributions
- Law of Large Numbers
- Central Limit Theorem
Two Philosophies of Statistical Inference
- There are two basic concepts of probability:
- Objective probability is the kind of probability supported or refuted by relative frequencies. For example:
- The probability of rolling a seven with two dice is 1/6.
- Epistemic probability is the kind of probability relative to evidence. For example:
- The Tunguska Event of 1908 was most likely caused by a meteor.
- Objective probability is the kind of probability supported or refuted by relative frequencies. For example:
- Bayesians use epistemic probabilities (as well as objective probabilities). They regard the following statement, for example, as a meaningful statistical claim:
- There’s a 95 percent probability that between 76 and 84 percent of the world population is religious.
- Frequentists use only objective probabilities. They argue that:
- Probabilities in science must be testable.
- Only objective probabilities are testable.
- Therefore only objective probabilities may be used in Statistics.
- They regard the following statement as a meaningful statistical claim (since it can be tested by multiple polls):
- There’s a 95 percent probability that, in a random poll of 1,000 people, between 76 of 84 of those polled are religious.
- But (since there’s only one world population) not the statement:
- There’s a 95 percent probability that between 76 and 84 percent of the world population is religious.
- View Bayesian and Classical Statistics.
Being Fooled by Statistics
- Statistics is tricky and it’s easy to be fooled.
- View Fooled by Statistics.