Random Variables and Probability Distributions

Contents

Inferential Statistics uses random variables and probability distributions.

Specifying Probabilities
  • Single Probability
    • The probability of rolling a seven = ⅙
  • Probability Range
    • The probability of rolling 5 through 9 = ⅔
    • The probability of rolling at least 10 = ⅙
  • Probability Distribution
    • The probabilities of rolling 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are 1/36, 1/18, 1/12, 1/9, 5/36, 1/6, 5/36, 1/9, 1/12, 1/18, 1/36 respectively.
Probability Distribution
  • A probability distribution is an assignment of probabilities to a set of values.
    • The Rolling Dice distribution assigns probabilities to the possible outcomes of a roll, the integers 2 through 12.
  • Probability distributions
    • have names like
      • Normal, Poisson, Chi-Square, Student T, Binomial, Hypergeometric, Beta, F
    • are discrete or continuous
      • A discrete distribution has gaps between its values
      • A continuous distribution has no gaps between its values, e.g. the normal distribution
      • Rolling Dice is discrete
    • may take input parameters, e.g.
      • the binomial distribution: number of trials and probability of success
      • the normal distribution: mean and standard deviation
      • Rolling Dice has no parameters
    • have properties such as
      • expectation (mean), median, standard deviation, variance, quantile, skewness, kurtosis
      • Rolling Dice
        • Expectation = 7
        • Median = 7
        • Standard deviation = 2.41523
        • 1/4 Quantile = 5
    • sum to 1
      • Rolling Dice
        • 1/36 + 1/18 + 1/12 + 1/9 + 5/36 + 1/6 + 5/36 + 1/9 + 1/12 + 1/18 + 1/36 = 1
  • Individual probabilities and ranges can be calculated from a probability distribution
    • Boxcars
      • Probability[x = 12, Distributed[x, Dice]] = 1/36
    • 7 or 11
      • Probability[x = 7 or x = 11, Distributed[x, Dice]] 2/9
    • x ≥ 10
      • Probability[x ≥ 10, Distributed[x, Dice]] = 1/6
  • A probability distribution is different from a frequency distribution.
Random Variables
  • Random variables are variables with built-in probability distributions.
  • Let random variable D be the outcome of rolling a pair of dice. Then:
  • A random variable
    • is usually designated by a capital letter
    • is defined by its probability distribution
    • has the properties of its probability distribution
      • discrete or continuous
      • expectation (mean), median, standard deviation, variance, quantile, skewness, kurtosis
  • Random variables are a neat, simple way of doing math involving probability distributions. For example:
    • Let random variable Y = BinomialDistribution[10, 0.2].
    • Let random variable Z = BinomialDistribution[10, 0.8]
      • Then YZ = BinomialDistribution[10, 0.2] x BinomialDistribution[10, 0.8]
    • Thus,
      • If P[Y=4] = 0.0880804 and P[Z=4] = 0.00550502, then
        • P[YZ=4] = 0.0880804 x 0.00550502 = 0.00048488
Binomial Distribution
  • The binomial distribution gives the probabilities of the possible outcomes of a series of binary tests, called Bernoulli Trials, where the result is either one thing or another: success or failure, heads or tails, right or wrong, 1 or 0.
  • It has two input parameters:
    • n = the number of trials.
    • p = the probability of success (heads, right, 1) on a given trial.
  • Let random variable X = the number of heads in 10 flips of an unbiased coin
    • The values of X are 0 through 10 and assigned probabilities by the Binomial Distribution[n,p], where n = 10 and p = 0.5
  • That is, X ~ Binomial Distribution[10, 0.5]
    • where the tilde means “has probability distribution such and such”
  • This bar chart displays the probabilities assigned to the values of X.
    • The values of X are on the horizontal axis
    • The probabilities of X are on the vertical axis.
  • The binomial distribution for different n and p:

Increasing n moves the distribution to the right

Changing p moves the distribution and alters its shape

  • The binomial distribution is discrete, meaning there are gaps between values.
    • P[X = 4] = 0.205078
    • P[X = 4.5] is undefined
    • P[X = 5] = 0. 246093
      • where X ~ Binomial Distribution[10, 0.5]
  • The probabilities of the values of a random variable total one.
    • Probability[x ≥0 and x ≤10, BinomialDistribution[10,0.5]] = 1.0
  • Examples of probabilities derived from the binomial distribution, using Mathematica
    • Probability[x ≥ 0 , BinomialDistribution[10, 0.5]] = 1
    • Probability[x =5 , BinomialDistribution[10, 0.5]] = 0.246094
    • Probability[x ≥ 8 , BinomialDistribution[10, 0.5]] = 0.0546875
    • Probability[x ≥ 4 and x ≤ 6 , BinomialDistribution[10, 0.5]] = 0.65625
    • Probability[x ≥ 8 or x ≤ 2 , BinomialDistribution[10, 0.5]] = 0.109375
  • Random numbers generated by the binomial distribution, using Mathematica
    • RandomVariate[BinomialDistribution[10, 0.5], 50]
      • 6,5,5,6,4,5,4,3,7,5,3,4,8,5,5,6,4,7,0,5,6,5,7,5,4,4,5,3,8,3,3,5,6,4,7,6,6,9,3,5,5,7,2,5,6,4,5,3,4,5
Normal Distribution
  • The most widely used continuous distribution is the Normal Distribution, a bell-shaped, symmetric distribution that approximates natural quantities such as blood pressure, income, and measurement errors.
  • The Normal Distribution has two input parameters:
    • μ = the mean of the distribution
    • σ = the standard deviation of the distribution
  • A normal distribution that approximates adult male heights in inches:
  • The standard normal distribution is defined by its parameters: μ = 0 and σ = 1:
  • The normal distribution for other μ and σ:

Increasing σ flattens the distribution

Changing μ moves the distribution left and right

  • The normal distribution is continuous, meaning there are no gaps between values.
    • Thus:
      •  
  • Examples of probabilities derived from the normal distribution for μ=70 and σ = 4, using Mathematica
    • Probability[x ≥ (12 x 6), NormalDistribution[70, 4]] = 0.31
    • Probability[x ≥ (12 x 6.5), NormalDistribution[70, 4]] = 0.023
    • Probability[x ≥ (12 x 5) and x ≤ (12 x 6), NormalDistribution[70, 4]] = 0.69
  • The probability between two numbers under a continuous distribution is defined as the area under its curve between the numbers.
    • The probability that a random number falls between 0.4 and 0.6 under the standard normal distribution = 0.0703251
Expectation of a Random Variable
  • The expectation of a random variable is its probability-weighted average, i.e. the sum (or integral) of its probability-weighted values.
  • Expectation of the discrete random variable X, where X ~ Binomial Distribution[10, 0.5]
    • Equals the sum of:
  • Expectation of the continuous random variable X, where X ~ NormalDistribution[5, 2]
Standard Deviation of a Random Variable
  • The standard deviation of a random variable is the “average” probability-weighted, mathematically-behaved distance of its values from its expectation.
    • (The more intuitive, but mathematically recalcitrant, average is the mean absolute deviation.)
  • The standard deviation is defined as the square root of the variance.
Variance of a Random Variable
  • The variance of a random variable X is the expectation of (X – the expectation of X)2
    • That is, E(X – E(X))2
    • Alternatively, E(X2) – (E(X))2
  • Variance of the discrete random variable X, where X ~ Binomial Distribution[10, 0.5]
    • The variance of X = the expectation of (X – the expectation of X)2
    • The expectation of X = 5
    • So the variance is the expectation of (X – 5)2
    • Which is the probability-weighted sum of (X – 5)2
    • In mathematical terms:
    • Which is the sum of:
        • = 2.5
  • Variance of the continuous random variable X, where X ~ continuous NormalDistribution[5, 2]
    • E(X – E(X))2
  • Alternatively
    • E(X2) – (E(X))2