Back to Probability Theory
- You’re given a random drug test that’s 95 percent reliable, meaning 95 percent of drug users test positive and 95 percent of non-drug users test negative. Assume five percent of the population takes drugs. You test positive. What’s the probability you use drugs?
- The obvious, but wrong, answer is 0.95. The right answer is calculated using Bayes Theorem, set forth by Reverend Thomas Bayes in the mid 1700’s:
- Rather than mechanically apply Bayes’ formula, we’ll do the calculation in more intuitive ways, using:
- Probability Trees
- Venn Diagrams
- Natural Frequencies
- Let’s start by representing the probabilities that a randomly selected member of the population uses drugs and doesn’t use drugs. The probabilities, diverging from the root of the tree, are 0.05 the person uses drugs (U) and 0.95 they don’t (~U).
- Next we represent the probabilities users and non-users test positive and negative:
- Since the test is 95% accurate, the probability a user tests positive (P 0.95) is 0.95 and the probability a non-user tests negative (N 0.95) is likewise 0.95. With a 5% chance of error, the probability a user tests negative (N 0.05) is 0.05; and the same for a non-user testing positive (P 0.05). These probabilities are conditional: P(P|U), P(N|~U), P(N|U) and P(P|~U).
- Next we calculate the probabilities of a true positive (U&P), a false negative (U&N), a false positive (~U&P), and a true negative (~U&N). This is done by multiplying the probabilities of U and ~U by the conditional probabilities for P and N.
- For example, the probability of a true positive (True Pos) = the probability of U (U 0.05) times the probability of P given U (P 0.95), equaling 0.0475 (True Pos = 0.0475); that is,
- P(U&P) = P(U) x P(P|U) = 0.05 x 0.95 = 0.0475.
- The probabilities of a false negative (False Neg), a false positive (False Pos), and a true negative (True Neg) are calculated similarly.
- The results:
- The probability you’re a drug user if you test positive is P(U|P).
- P(U|P), by definition, equals P(U&P) / P(P).
- P(P) = P(True Pos + False Pos)
- Since P is logically equivalent to (U&P) v (~U&P)
- So P(U|P) = P(U&P) / P(True Pos + False Pos) = 0.0475 / (0.0475 + 0.0475) = ½.
- A Venn Diagram throws light on the result. If you test positive you’re in the P circle, which is equal parts True Pos and False Pos. The probability you’re a drug user, a True Pos, is therefore ½.
- Bayes’ formula of course generates the same answer:
- P(U|P) = ( P(U) x P(P|U) ) / ( (P(U) x P(P|U)) + (P(~U) x P(P|~U)) ).
- P(U|P) = (0.05 x 0.95 ) / ( (0.05 x 0.95) + (0.95 x 0.05) ) = ½
- H1 = You’re a drug user
- H2 = You don’t use drugs
- E = You tested positive
- In Calculated Risks Gerd Gigerenzer recommends natural frequencies to calculate probabilities. The idea is that you restate a probability problem using numbers of things rather than probabilities, solve the problem, and state the solution as a probability.
- Solving the drug problem using natural frequencies:
- Assume a population of 10,000. Five percent use drugs. So there are 500 users and 9,500 non-users. The drug test is 95 percent reliable. So of the 500 users, 475 test positive and 25 negative. And of the 9,500 non-users, 475 test positive and 9,025 negative. So 950 people test positive, half users (475) and half non-users (475). Thus the probability you’re a drug user if you test positive is 475 / 950 = ½.
- There’s much to be said for Gigerenzer’s natural frequencies.
- How does a 95% reliable test wind up with a probability of 0.5? Here’s how to look at the matter. The drug test screws up two ways: generating positive results for non-users and negative results for users. Both sorts of screw-up occur at the same rate of 5 of 100. But far more people don’t use drugs than do; so the drug test has many more chances of screwing up on non-users. Thus, although the screw-up rates are the same, there are a lot more false positives than false negatives. For example, the expected results of testing 10,000 people are:
- With 5% of the population using drugs, there are 500 users and 9500 non-users. For the 500 users the test screws up 25 times, the false negatives. For the 9500 non-users, the test screws up 475 times, though the screw-up rate is the same. There are thus 475 false positives. It so happens that 475 of the 500 users test positive. So the false positives equal the true positives. If you test positive, the odds are even you use drugs.
- One lesson of the drug problem is to be clear on the direction of conditional probabilities.
- P(P|U), 0.95 in the problem, is the probability a subject tests positive if a user, generating the prediction that, if a lot of drug users are tested, close to 95% will be positive.
- By contrast, P(U|P), ½ by our calculation, is the probability a subject is a user they test positive, generating the prediction that, among a large number of positive outcomes, approximately half will be drug-users.
- If you’re fired after a random drug test, though innocent, you can thank the latter conditional probability.
- A second lesson is to remember background evidence. In assessing the probability of a hypothesis, background evidence must be considered along with the results of tests, studies, and experiments. The probability you’re a drug-user given a positive result is ½ rather than 0.95 because of the background fact that 95% of people don’t use drugs. Likewise claims of telepathy, alien abduction, and clairvoyance are doubtful in light of previous such claims being unconfirmed. “Extraordinary claims require extraordinary evidence,” as Carl Sagan wrote.