

Quantifying Uncertainty
Grayson White
Math 141
Week 6 | Fall 2025
Return to the Americian Statistical Association’s “Ethical Guidelines for Statistical Practice”
“The ethical statistical practitioner seeks to understand and mitigate known or suspected limitations, defects, or biases in the data or methods and communicates potential impacts on the interpretation, conclusions, recommendations, decisions, or other results of statistical practices.”
“For models and algorithms designed to inform or implement decisions repeatedly, develops and/or implements plans to validate assumptions and assess performance over time, as needed. Considers criteria and mitigation plans for model or algorithm failure and retirement.”
Algorithmic bias: when the model systematically creates unfair outcomes, such as privileging one group over another.
Example: The Coded Gaze

Facial recognition software struggles to see faces of color.
Algorithms built on a non-diverse, biased dataset.
Algorithmic bias: when the model systematically creates unfair outcomes, such as privileging one group over another.
Example: COMPAS model used throughout the country to predict recidivism


# A tibble: 1 × 1
meanFINCBTAX
<dbl>
1 62480.
Like with regression, need to distinguish between the population and the sample
R has been giving us uncertainty estimates:
library(pdxTrees)
kenilworth <- get_pdxTrees_parks() %>%
filter(Park %in% c("Kenilworth Park")) %>%
drop_na(Functional_Type, Native)
ggplot(kenilworth, aes(x = Tree_Height,
y = DBH, color = Native)) +
geom_point() +
stat_smooth(method = "lm", se = TRUE) +
scale_color_manual(values = c("plum3", "goldenrod")) +
theme(legend.position = "bottom") 
R has been giving us uncertainty estimates:
modTrees <- lm(Tree_Height ~ DBH*Native, data = kenilworth)
library(moderndive)
get_regression_table(modTrees)# A tibble: 4 × 7
term estimate std_error statistic p_value lower_ci upper_ci
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 intercept 7.39 3.65 2.03 0.044 0.201 14.6
2 DBH 2.25 0.17 13.3 0 1.92 2.59
3 Native: Yes 11.1 5.59 1.98 0.049 0.067 22.1
4 DBH:NativeYes 0.315 0.215 1.47 0.144 -0.108 0.739
Uncertainty estimates are constantly reported in news and journal articles:


Uncertainty estimates are constantly reported in news and journal articles:

Goal: Draw conclusions about the population based on the sample.

Goal: Draw conclusions about the population based on the sample.
Main Flavors
Estimating numerical quantities (parameters).
Testing conjectures.
Goal: Estimate a (population) parameter.
Best guess?
Example: Are GIFs just another way for people to share videos of their pets?
Want to estimate the proportion of GIFs that feature animals.
Key Question: How accurate is the statistic as an estimate of the parameter?
Helpful Sub-Question: If we take many samples, how much would the statistic vary from sample to sample?
Need two new concepts:
The sampling variability of a statistic
The sampling distribution of a statistic
Steps to Construct an (Approximate) Sampling Distribution:
Decide on a sample size, \(n\).
Randomly select a sample of size \(n\) from the population.
Compute the sample statistic.
Put the sample back in.
Repeat Steps 2 - 4 many (1000+) times.

Center? Shape?
Spread?
What happens to the center/spread/shape as we increase the sample size?
What happens to the center/spread/shape if the true parameter changes?
R!