**Table of Contents**

#### Statistics

*Statistics*is the branch of mathematics dealing with presenting, summarizing, and making inferences from data.*Descriptive Statistics*is concerned with presenting and summarizing data.*Inferential Statistics*deals with making inferences from data.

#### The Process:

**Data, Statistic, Inference**

**The Process**- Data is gathered and recorded
- A statistic is computed from the data.
- A conclusion is inferred from the statistic

**Data***Primary (or raw) data*are recorded facts about entities.- Such data can be represented by rows (representing the entities) and columns (representing the facts), e.g names of students and their grades on exams.

**Statistic**- A
*statistic*is a mathematical calculation on the data, for example:*maximum, mean, median, minimum, mode, percentage, percent change, proportion, quantile, range, rate, standard deviation, sum, variance*

- If the data is accurate and the math correct, the statistic is a hard fact.

- A
**Inference**- A
*conclusion*is inferred from a statistic

- Kinds of conclusions:
*Generalizations*, inferred from random samples*Forecasts*, derived from regressions, time series, and models*Causal Statements*, inferred from correlations, hypothesis tests, and regressions.

- A

**Descriptive Statistics**

*Descriptive Statistics*presents and summarizes primary data, using- Statistics calculated from the primary data

- Spreadsheets displaying either primary data or calculated statistics
- Graphs representing primary data and calculated statistics visually.

###### Statistics

*Statistics*are used to summarize data.- Typical statistics
*Maximum, mean, median, minimum,**mode, percent, percent change, proportion, quantile, range, rate, standard deviation, sum, variance*

- Fancy statistics
- Adjusted
*Inflation-adjusted**Seasonally-adjusted**Age-adjusted*

- Indices
*Consumer Price Index**GDP Deflator Index**Gini Index**S&P 500 Index*

- Adjusted

###### Spreadsheets

- The rows and columns of a spreadsheet can be either primary data or calculated statistics

Graphs

View Data and statistics used for the graphs

*Graphs*represent data and statistics visually.- In this Bar Chart:
- Bars represent nations.
- A bar’s height represents the nation’s GDP, per the scale of vertical axis.

- In this Scatter Plot:
- Dots represent nations.
- A dot’s location indicates the nation’s GDP (per the scale of the horizontal axis) and its life expectancy (per the scale of the vertical axis).

- Histograms represent frequency distributions.
- In this Histogram:
- The bins represent ranges of life expectancies, per the scale of the horizontal axis
- A bin’s height represents the number of nations in its range, per the scale of the vertical axis.

- For example, the rightmost bin represents life expectancies between 85 and 90 years. Only Japan and Singapore qualify.

- The data can also be represented in table form:

**Inferential Statistics**

*Inferential Statistics*deals with making inferences from data.

##### Kinds of Statistical Inference

- Hypothesis Testing
*Hypothesis Testing*uses the concept of probability to assess the extent the data supports a hypothesis. e.g. the effectiveness of a drug given the results of clinical trial.- The basic form of argument is:
- The data is either explained by hypothesis H or is due to chance.
- It’s unlikely the data is due to chance.
- Therefore, it’s likely H is true, other things being equal.

- View Example
- View Hypothesis Testing

- Estimation
uses the concept of probability to infer the value of a population parameter from a random sample, e.g. a president’s approval rating based on a poll.*Estimation*- The basic form of argument is:
- A decent-sized random sample from a population exhibits a statistic S, e.g. mean, proportion, standard deviation.
- Parameter P, S’s population counterpart, is derived from S.
- Therefore, in all probability, the population exhibits P within a certain margin of error, other things being equal.

- View Example
- View Estimation

- Regression Analysis
*Regression Analysis*is a set of procedures for finding and evaluating an equation that predicts a dependent variable from a set of independent variables.- The basic form of argument is:
- Equation E, with a certain degree of accuracy, predicts the observed values of a dependent variable from the observed values of a set of independent variables.
- Therefore, with the same degree of accuracy, E predicts the dependent variable from the independent variables generally.

- View Example
- View Regression

##### Two Philosophies of Statistical Inference

*Bayesian*and*Frequentist*Statistics- Pew Research
- “These models were fit using a
*Bayesian*framework, which means that it is necessary to specify a prior distribution for each parameter in the model. “

- “These models were fit using a
- Gallup
- “For results based on the total sample of national adults, the margin of sampling error is ±2 percentage points at the 95%
*confidence level*.”

- “For results based on the total sample of national adults, the margin of sampling error is ±2 percentage points at the 95%
- View Example

- Pew Research

##### Probability Distributions

- Fundamental Concept of Inferential Statistics
- Chance measured by a
*probability distribution*(*random variable*)

- Chance measured by a

##### Mathematical Theorems Underlying Inferential Statistics

- Law of Large Numbers
- View Law of Large Numbers

- Central Limit Theorem

##### Two Senses of ‘Proof’

- Can statistics “prove” things?
- Two important senses of “proof”
- merriam-webster.com/dictionary/proof
- 1a: the cogency of evidence that compels acceptance by the mind of a truth or a fact
- 1b: the process or an instance of establishing the validity of a statement especially by derivation from other statements in accordance with principles of reasoning

- merriam-webster.com/dictionary/proof
- Example of 1a:
- Presenting evidence in court that proves the defendant is guilty beyond a reasonable doubt.

- Examples of 1b:
- mathematical proof, logical derivation, computation of statistics. e.g. calculation of p-value from the results of an RCT.

- Statistics in sense 1b can prove correlations, p-values, and odds ratios, but not causal statements, forecasts, or generalizations. For example, you can’t construct a step-by-step derivation of a causal statement from the results of a clinical trial..
- But statistics can prove statements (that go beyond statistics) in sense 1a, i.e. by presenting evidence (p-values, correlation coefficients, confidence intervals) that prove the statements beyond a reasonable doubt. For example, clinical trials and observational studies have collectively proved (established, shown) that Pfizer’s vaccine prevents Covid cases, hospitalizations, and deaths.

- Two important senses of “proof”