

P-value Pitfalls
Grayson White
Math 141
Week 9 | Fall 2025
A hearty p-values discussion
Key probability concepts
(and connection to confidence intervals)
Data Visualizer
Data Wrangler
Model Builder
Inferencer
The original intention of the p-value was as an informal measure to judge whether or not a researcher should take a second look.
But to create simple statistical manuals for practitioners, this informal measure quickly became a rule: “p-value < 0.05” = “statistically significant”.
What were/are the consequences of the “p-value < 0.05” = “statistically significant” rule?

A consequence: P-hacking: Cherry-picking promising findings that are beyond this arbitrary threshold.

Example: A recent Nature study of 19,000+ people found that those who meet their spouses online…
Are less likely to divorce (p-value < 0.002)
Are more likely to have high marital satisfaction (p-value < 0.001)
BUT the estimated effect sizes were tiny.
Question: Do these results provide compelling evidence that one should change their dating behavior?
A consequence: People conflate statistical significance with practical significance.
We won’t use the “statistically significant” language in Math 141. Instead say “statistically discernible.”
The American Statistical Association created a set of principles to address misconceptions and misuse of p-values:
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether or not a p-value passes a specific threshold (i.e. 0.05).
Proper inference requires full reporting and transparency.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
Despite its issues, p-values are still quite popular and can still be a useful tool when used properly.
In 2014, George Cobb a professor from Mount Holyoke College poised the following questions (and answers):

Understanding p-values and being able to interpret a p-value in context is a learning objective of Math 141.
Understanding that a small p-value means we have evidence for \(H_a\) is important.
Understanding that a small p-value alone does not imply practical significance.
Understanding that what you mean by small should depend on your field and whether a Type I Error or Type II Error is worse for your particular research question.
Your ability to tell if a # is less than 0.05 is not a learning objective for Math 141.


Question: How did folks do inference before computers?

Question: How did folks do inference before computers?

Question: How did folks do inference before computers?

Question: How did folks do inference before computers?
“All models are wrong but some are useful.” – George Box
Question: How can we use theoretical probability models to approximate our (sampling) distributions?

Before we can answer that question and apply the models, we need to learn about the theoretical probability models themselves.
Random variable (RV) is a random process that takes on numerical values.
Random variables have probability functions that tell us the likelihood of specific values.
For discrete RV, probability function is:
\[ p(x) = P(X = x) \]
where \(\sum p(x) = 1\).
For a discrete random variable, care about its:
Distribution: \(p(x) = P(X = x)\)
Center – Mean:
\[ \mu = \sum x p(x) \]
\[ \sigma^2 = \sum (x - \mu)^2 p(x) \]
\[ \sigma = \sqrt{ \sum (x - \mu)^2 p(x)} \]
Suppose 4 students have still not received their graded Math 141 midterm (yes, let’s pretend we actually had a hand-written midterm) and that I hand back the exams randomly to each student. Let X = the number of students who get their correct exam.
Questions:
Let’s say the student’s names are A(licia), B(ob), C(olin), and D(onna) and they are sitting in a row ABCD. One possible outcome is ABDC (1st exam goes to A, 2nd to B, 3rd to D, 4th to C). In that case, what does X equal?
List out all possible outcomes. And for each outcome, determine what X equals.
Why is P(X = 3) = 0?
Write out the probability distribution for X.
Determine the mean value of X.
Determine the standard deviation of X.
What is the probability that at least one student gets their correct exam?