Statistical Thinking
Grayson White
Math 141
Week 1 | Fall 2025
Before we get started…
if you are on the waitlist, please sign in to class here: tinyurl.com/math141-wait
The course website, math-141.github.io, will be the central location for all our course materials.
We’ll use a few other resources to navigate Math 141, but all other tools can be accessed by the quick links in the course website!
Reed’s RStudio Server, for coursework,
A course-wide Slack workspace, for course communication,
Gradescope, for turning in assignments and exams, and
Moodle, sparingly, for private course materials
Name: Grayson White (he/him), you can call me ‘Grayson’
Email: gwhite@reed.edu
Office: Library 390
Office hours: Monday 2:30-4, Wednesday 1-2:30, or by appointment
Course assistant office hours coming soon!
In this course, you will learn how to think critically with data by engaging in the entire data analysis process.
Most of our time will be spent in the Exploration and Visualization, Data Wrangling, and Modeling and Inference steps, but we will spend some time in each cog!
We expect everyone in this class to strive to foster a learning environment that is equitable, inclusive, and welcoming. If you experience any barriers to learning, please come to Professor Grayson White or a college administrator with your concerns.
Code of Conduct:
We expect all members of Math 141 to make participation a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
We expect everyone to act and interact in ways that contribute to an open, welcoming, inclusive, and healthy community of learners. You can contribute to a positive learning environment by demonstrating empathy and kindness, being respectful of differing viewpoints and experiences, and giving and gracefully accepting constructive feedback.1
01:00
Artificial intelligence (AI) tools, such as ChatGPT, Claude, Gemini, and others are being used to generate code, analyze data, and much more. However, learning to think critically about a problem at hand, and engaging with your peers, tutors, and instructors when not understanding a concept or question are integral components of a liberal arts education. Further, a key goal of this course is for you to learn how to thoughtfully, ethically, and independently extract knowledge from data and engage in statistical reasoning. Therefore, the use of generative AI tools, such as ChatGPT and others, are strictly prohibited in any stage of the work process for this course. If you have questions about whether a tool is allowed for this course, ask the Instructor before using it.
01:00
What is statistical thinking?
It is not the same as mathematical thinking.
Let’s discover what statistical thinking is through some examples.
Will use a wide-range of real and relevant data examples
I understand that some of these topics have likely had profound impacts on your lives.
We will focus class time on the key course objectives but will use these current topics to empower ourselves and to see how we can productively participate with data.
At a quick first glance, what story does the graph appear to be telling?
What is misleading about the graph? How could we fix this issue?
Alberto Cairo, a journalist and designer, created the second graph of the Georgia COVID-19 data:
A key principle of data visualization is to “help the viewer make meaningful comparisons”.
What comparisons are made easy by the lefthand graph? What about by the righthand graph?
From these graphs, can we get an accurate estimate of the COVID prevalence in these Georgian counties over this two week period?
About developing reasoning (not just learning definitions and formulae).
Statistical thinking requires judgment that takes time to develop.
Developing our statistical thinking skills will allow us to soundly extract knowledge from data!
“‘Raw data’ is an oxymoron.” – Lisa Gitelman
“Data … is information made tractable.” – Catherine D’Ignazio and Lauren Klein
Data in spreadsheet-like format where:
Rows = Observations/cases
Columns = Variables
ID | kind | .pred_AI | .pred_class | detector | native | name | model |
---|---|---|---|---|---|---|---|
1 | Human | 0.9999942 | AI | Sapling | No | Real TOEFL | Human |
2 | Human | 0.8281448 | AI | Crossplag | No | Real TOEFL | Human |
3 | Human | 0.0002137 | Human | Crossplag | Yes | Real College Essays | Human |
4 | AI | 0.0000000 | Human | ZeroGPT | NA | Fake CS224N - GPT3 | GPT3 |
5 | AI | 0.0017841 | Human | OriginalityAI | NA | Fake CS224N - GPT3, PE | GPT4 |
6 | Human | 0.0001783 | Human | HFOpenAI | Yes | Real CS224N | Human |
R
package detectors
.ID | kind | .pred_AI | .pred_class | detector | native | name | model |
---|---|---|---|---|---|---|---|
1 | Human | 0.9999942 | AI | Sapling | No | Real TOEFL | Human |
2 | Human | 0.8281448 | AI | Crossplag | No | Real TOEFL | Human |
3 | Human | 0.0002137 | Human | Crossplag | Yes | Real College Essays | Human |
4 | AI | 0.0000000 | Human | ZeroGPT | NA | Fake CS224N - GPT3 | GPT3 |
5 | AI | 0.0017841 | Human | OriginalityAI | NA | Fake CS224N - GPT3, PE | GPT4 |
6 | Human | 0.0001783 | Human | HFOpenAI | Yes | Real CS224N | Human |
Rows = Observations/cases
What are the cases? What does each row represent?
ID | kind | .pred_AI | .pred_class | detector | native | name | model |
---|---|---|---|---|---|---|---|
1 | Human | 0.9999942 | AI | Sapling | No | Real TOEFL | Human |
2 | Human | 0.8281448 | AI | Crossplag | No | Real TOEFL | Human |
3 | Human | 0.0002137 | Human | Crossplag | Yes | Real College Essays | Human |
4 | AI | 0.0000000 | Human | ZeroGPT | NA | Fake CS224N - GPT3 | GPT3 |
5 | AI | 0.0017841 | Human | OriginalityAI | NA | Fake CS224N - GPT3, PE | GPT4 |
6 | Human | 0.0001783 | Human | HFOpenAI | Yes | Real CS224N | Human |
Columns = Variables
Variables: Describe characteristics of the observations
Quantitative: Numerical in nature
Categorical: Values are categories
Identification: Uniquely identify each case
ID | kind | .pred_AI | .pred_class | detector | native | name | model |
---|---|---|---|---|---|---|---|
1 | Human | 0.9999942 | AI | Sapling | No | Real TOEFL | Human |
2 | Human | 0.8281448 | AI | Crossplag | No | Real TOEFL | Human |
3 | Human | 0.0002137 | Human | Crossplag | Yes | Real College Essays | Human |
4 | AI | 0.0000000 | Human | ZeroGPT | NA | Fake CS224N - GPT3 | GPT3 |
5 | AI | 0.0017841 | Human | OriginalityAI | NA | Fake CS224N - GPT3, PE | GPT4 |
6 | Human | 0.0001783 | Human | HFOpenAI | Yes | Real CS224N | Human |
Every time you get a new dataset, spend time exploring the variables.
Example questions:
Is each variable capturing the desired information?
For categorical variables, what are the categories? Do those categories adequately represent that variable?
For quantitative variables, what values are possible? Were the data rounded or binned? Are those values actually encoding categories? What are the units of measurement?
Step 1: Collect data on some aspect of your life.
Step 2: Find a story in your data and determine your postcard recipient.
Step 3: Figure out how you want to visualize the story.
Step 4+: Visualize your data on a blank postcard.