Bootstraping and CIs in R

Grayson White

Math 141
Week 8 | Fall 2025

Announcements

Midterm grades released will be released Tuesday.
Some logistics on Wednesday: (Written) midterm corrections, final exam info

Goals for Today

Recall ideas about the sampling distribution and bootstrapping
Bootstrapping in R
Learn more about confidence intervals and compute them in R

Review Discussion: Sampling distributions and bootstrapping

Confidence Intervals

Recall that a confidence interval is an interval of plausible values for a parameter.

Form: \(\mbox{statistic} \pm \mbox{Margin of Error}\)
Question: How do we find the Margin of Error (ME)?
Answer: If the sampling distribution of the statistic is approximately bell-shaped and symmetric, then a statistic will be within 1.96 SEs of the parameter for 95% of the samples.
Form: \(\mbox{statistic} \pm 1.96\times\mbox{SE}\)
Called a 95% confidence interval (CI). (Will discuss the meaning of confidence soon)

Confidence Intervals

95% CI Form:

\[ \mbox{statistic} \pm 1.96\times\mbox{SE} \]

It is easy to compute a statistic for the form above (sample proportion, sample mean, …)

But… What else do we need to construct the CI?

Problem: To compute the SE, we need many samples from the population. We have 1 sample.
Solution: Approximate the sampling distribution using ONLY OUR ONE SAMPLE!

Bootstraping: Algorithm

How do we approximate the sampling distribution?

Bootstrap Distribution of a Sample Statistic:

Take a sample of size \(n\) with replacement from the sample. Called a bootstrap sample.
Compute the statistic on the bootstrap sample.
Repeat 1 and 2 many (1000+) times.

Bootstrapped Confidence Intervals

Two Methods

Assuming random sample and roughly bell-shaped and symmetric bootstrap distribution for both methods.

SE Method 95% CI:

\[ \mbox{statistic} \pm 1.96 \times\widehat{\mbox{SE}} \]

We approximate \(\mbox{SE}\) with \(\widehat{\mbox{SE}}\) = the standard deviation of the bootstrapped statistics.

Percentile Method CI:

If I want a P% confidence interval, I find the bounds of the middle P% of the bootstrap distribution.

How can we construct bootstrap distributions and bootstrapped CIs in R?

Load Packages and Data

library(tidyverse)
library(infer)
library(palmerpenguins)

Let’s return to the Palmer Penguins!

# Read in data
head(penguins)

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>

We’ll see new functions to complete data analysis workflow steps with the infer package today.

Estimation for a Single Mean

What is the average bill length \((\mu)\) of an Adelie penguin?

# Compute the summary statistic
x_bar <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  calculate(stat = "mean")
x_bar

Response: bill_length_mm (numeric)
# A tibble: 1 × 1
   stat
  <dbl>
1  38.8

Estimation for a Single Mean

What is the average bill length \((\mu)\) of an Adelie penguin?

# Compute the summary statistic
x_bar <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  calculate(stat = "mean")
x_bar

Response: bill_length_mm (numeric)
# A tibble: 1 × 1
   stat
  <dbl>
1  38.8

Estimation for a Single Mean

What is the average bill length \((\mu)\) of an Adelie penguin?

# Compute the summary statistic
x_bar <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  calculate(stat = "mean")
x_bar

Response: bill_length_mm (numeric)
# A tibble: 1 × 1
   stat
  <dbl>
1  38.8

Estimation for a Single Mean

What is the average bill length \((\mu)\) of an Adelie penguin?

# Compute the summary statistic
x_bar <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  calculate(stat = "mean")
x_bar

Response: bill_length_mm (numeric)
# A tibble: 1 × 1
   stat
  <dbl>
1  38.8

Why is our numerical quantity a mean and not a proportion or correlation here?

Estimation for a Single Mean

# Construct bootstrap distribution
bootstrap_dist <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  generate(reps =  1000, type = "bootstrap") %>%
  calculate(stat = "mean")

# Look at bootstrap distribution
ggplot(data = bootstrap_dist, 
       mapping = aes(x = stat)) +
  geom_histogram(color = "white")

Estimation for a Single Mean

# Construct bootstrap distribution
bootstrap_dist <- penguins %>%
  filter(species == "Adelie") %>%
  drop_na(bill_length_mm) %>%
  specify(response = bill_length_mm) %>%
  generate(reps =  1000, type = "bootstrap") %>%
  calculate(stat = "mean")

# Look at bootstrap distribution
ggplot(data = bootstrap_dist, 
       mapping = aes(x = stat)) +
  geom_histogram(color = "white")

Estimation for a Single Mean – SE Method

# Get confidence interval
ci <- bootstrap_dist %>% 
  get_confidence_interval(type = "se", level = 0.95,
                          point_estimate = x_bar)
ci

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     38.4     39.2

Interpretation: The point estimate is 38.79mm. I am 95% confident that the true average bill length of Adelie penguins is between 38.37mm and 39.21mm.

Estimation for a Single Mean

# Visualize confidence interval
bootstrap_dist %>%
  visualize() +
  shade_confidence_interval(endpoints = ci)

Estimation for a Single Mean – Percentile Method

# Get confidence interval 
ci_95 <- bootstrap_dist %>% 
  get_confidence_interval(type = "percentile",
                          level = 0.95) 
ci_95

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     38.4     39.2

Estimation for Difference in Means

What is the difference in average bill length between Adelie penguins and Chinstrap penguins \((\mu_1 - \mu_2)\)?

# Compute the summary statistic
diff_x_bar <- penguins %>%
  drop_na(bill_length_mm) %>%
  filter(species %in% c("Adelie", "Chinstrap")) %>%
  specify(response = bill_length_mm, explanatory = species) %>%
  calculate(stat = "diff in means",
            order = c("Adelie", "Chinstrap"))
diff_x_bar

Response: bill_length_mm (numeric)
Explanatory: species (factor)
# A tibble: 1 × 1
   stat
  <dbl>
1 -10.0

Why a difference in means?

Estimation for Difference in Means

# Construct bootstrap distribution
bootstrap_dist <-penguins %>%
  drop_na(bill_length_mm) %>%
  filter(species %in% c("Adelie", "Chinstrap")) %>%
  specify(bill_length_mm ~ species) %>%
  generate(reps =  1000, type = "bootstrap") %>%
  calculate(stat = "diff in means",
            order = c("Adelie", "Chinstrap"))

# Look at bootstrap distribution
ggplot(data = bootstrap_dist,
       mapping = aes(x = stat)) +
  geom_histogram(color = "white")

Estimation for Difference in Means – SE Method

# Get confidence interval 
ci_95 <- bootstrap_dist %>% 
  get_confidence_interval(type = "se", level = 0.95,
                          point_estimate = diff_x_bar) 
ci_95

# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    -10.9    -9.15

Interpretation: The point estimate is -10.04mm. I am 95% confident that the true average difference in bill length between Adelie and Chinstrap penguins is between -10.93mm and -9.15mm.

Comparing CIs

ci_99 <- bootstrap_dist %>% 
  get_confidence_interval(type = "se", level = 0.99,
                          point_estimate = diff_x_bar)

bootstrap_dist %>%
  visualize() +
  shade_confidence_interval(endpoints = ci_99,
                            fill = "gold1",
                            color = "gold3") +
  shade_confidence_interval(endpoints = ci_95)

Why construct a 95% CI versus a 99% CI?

What do we mean by confidence?

Confidence level = success rate of the method under repeated sampling
How do I know if my ONE CI successfully contains the true value of the parameter?
As we increase the confidence level, what happens to the width of the interval?
As we increase the sample size, what happens to the width of the interval?
As we increase the number of bootstrap samples we take, what happens to the width of the interval?

Interpreting Confidence Intervals

Example: Estimating average household income before taxes in the US

SE Method Formula:

\[ \mbox{statistic} \pm{\mbox{ME}} \]

# A tibble: 1 × 4
  statistic    ME  lower  upper
      <dbl> <dbl>  <dbl>  <dbl>
1    62409. 1959. 60521. 64439.

“The margin of [sampling] error can be described as the ‘penalty’ in precision for not talking to everyone in a given population. It describes the range that an answer likely falls between if the survey had reached everyone in a population, instead of just a sample of that population.” – Courtney Kennedy, Director of Survey Research at Pew Research Center

CI = interval of plausible values for the parameter

Safe interpretation: I am P% confident that {insert what the parameter represents in context} is between {insert lower bound} and {insert upper bound}.

Caution: Intervals in the wild

Statement in an article for The BMJ (British Medical Journal):

Confidence Interval Misunderstandings

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

A 95% confidence interval does not contain 95% of observations in the population.

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

A 95% confidence interval does not contain 95% of observations in the population.

Saying that 95% of all Reed students sleep between 7.86 and 8.34 hours should just feel wrong. That’s a pretty narrow interval!

Misunderstanding 2

A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Misunderstanding 2

A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Q: Why do the sampling distribution and bootstrap distribution look different?

Misunderstanding 3

Given a 95% confidence interval, Do Not Say: “There is a 95% chance that the true parameter falls within my interval.”

Once we take a sample and calculate a confidence interval, there’s no more randomness!
- The interval either does or doesn’t contain the (unknown) parameter.

This may seem like arguing over semantics – but it’s an important distinction!

Instead, say either:

“If we were to take many samples and calculate a confidence interval for each, 95% of them would contain the true parameter”
“We are 95% confident that the true parameter is in our confidence interval”