Statistical Thinking

Grayson White

Math 141
Week 1 | Fall 2025

Before we get started…

if you are on the waitlist, please sign in to class here: tinyurl.com/math141-wait

Goals for Today

Getting started in Math 141
- Course structure and technologies
- Where to find resources
- Course expectations
Statistical thinking
Data Frames

But first, let me quickly introduce myself…

Let’s start with my path (back) to Reed…

Research Interests

Statistical modeling with environmental applications

Research Interests

Advising Undergraduate Forestry Data Science Research

Getting Started in Math 141

Course website

The course website, math-141.github.io, will be the central location for all our course materials.
We’ll use a few other resources to navigate Math 141, but all other tools can be accessed by the quick links in the course website!

Getting Started in Math 141

Other Resources

Reed’s RStudio Server, for coursework,

A course-wide Slack workspace, for course communication,

Gradescope, for turning in assignments and exams, and

Moodle, sparingly, for private course materials

Getting Started in Math 141: Finding Resources

Getting Started in Math 141: Finding Resources

Getting Started in Math 141: Finding Resources

Let’s take a look at the course website…

math-141.github.io

Math 141: The whole game

Course expectations and structure

My info

Name: Grayson White (he/him), you can call me ‘Grayson’

Email: gwhite@reed.edu

Office: Library 390

Office hours: Monday 2:30-4, Wednesday 1-2:30, or by appointment

A typical day in Math 141

Also, the Math 141 Teaching Team are so excited to support your learning!

Course assistant office hours coming soon!

Learning Outcomes

In this course, you will learn how to think critically with data by engaging in the entire data analysis process.
Most of our time will be spent in the Exploration and Visualization, Data Wrangling, and Modeling and Inference steps, but we will spend some time in each cog!
- First ~3 weeks in Exploration and Visualization and Data Wrangling
- Next ~3 weeks in Modeling
- Final ~half of the course in Inference

Assignments and Exams

Lab assignments (weekly-ish)
- Assigned Thursday in lab
- Due the following Thursday at 9am
- Assigned each week besides exam weeks

Homework assignments (daily-ish)
- Assigned most days in lecture
- Due in person at the following lecture
- Much shorter than the labs, often related to the daily reading

In-class activities (semi-regular)
- Individual activities
- Group activities
- Low-stakes quizzes

Exams (1 midterm, 1 final)
- Both have a written component and an oral component.
- Midterm written: 10/12 - 10/14
- Midterm orals: 10/15-10/16
- Final written: reading week
- Final orals: finals week

Course Climate

We expect everyone in this class to strive to foster a learning environment that is equitable, inclusive, and welcoming. If you experience any barriers to learning, please come to Professor Grayson White or a college administrator with your concerns.

Code of Conduct:

We expect all members of Math 141 to make participation a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

We expect everyone to act and interact in ways that contribute to an open, welcoming, inclusive, and healthy community of learners. You can contribute to a positive learning environment by demonstrating empathy and kindness, being respectful of differing viewpoints and experiences, and giving and gracefully accepting constructive feedback.¹

01:00

Late Work

Labs: up to 4 extensions days can be used throughout the semester.
- e.g., 1 additional day for 4 labs, 4 additional days for 1 lab, …
- rounding days up

Homeworks: no late homework assignments are accepted, but the lowest three scores will be dropped.

In-class assignments: cannot be made up.

Exams: no late exams are accepted.

Artificial Intelligence (AI) Policy

Artificial intelligence (AI) tools, such as ChatGPT, Claude, Gemini, and others are being used to generate code, analyze data, and much more. However, learning to think critically about a problem at hand, and engaging with your peers, tutors, and instructors when not understanding a concept or question are integral components of a liberal arts education. Further, a key goal of this course is for you to learn how to thoughtfully, ethically, and independently extract knowledge from data and engage in statistical reasoning. Therefore, the use of generative AI tools, such as ChatGPT and others, are strictly prohibited in any stage of the work process for this course. If you have questions about whether a tool is allowed for this course, ask the Instructor before using it.

01:00

Why this policy?

Engagement

Being actively present is key.

During lecture and lab, remove distractions.
- When we are on our computers, close email, social media, news, etc.
- Hide your phone.
I have high expectations but know that all of you (regardless of your stats, math, or computing background) have the ability to meet them.

Course or syllabus questions?

Math 141 is about developing our statistical thinking skills.

What is statistical thinking?

It is not the same as mathematical thinking.

Let’s discover what statistical thinking is through some examples.

Data in Math 141

Will use a wide-range of real and relevant data examples

Data in Math 141

I understand that some of these topics have likely had profound impacts on your lives.
We will focus class time on the key course objectives but will use these current topics to empower ourselves and to see how we can productively participate with data.

Example: Visualizing COVID Prevalence

In May of 2020, the Georgia Department of Public Health posted the following graph:

At a quick first glance, what story does the graph appear to be telling?
What is misleading about the graph? How could we fix this issue?

Example: Visualizing COVID Prevalence

After public outcry, the Georgia Department of Public Health said they made a mistake and posted the following updated graph:

How do your conclusions about COVID-19 cases in Georgia change when now interpreting this new graph?

Example: Visualizing COVID Prevalence

Alberto Cairo, a journalist and designer, created the second graph of the Georgia COVID-19 data:

A key principle of data visualization is to “help the viewer make meaningful comparisons”.
What comparisons are made easy by the lefthand graph? What about by the righthand graph?
From these graphs, can we get an accurate estimate of the COVID prevalence in these Georgian counties over this two week period?

Statistical Thinking

About developing reasoning (not just learning definitions and formulae).
Statistical thinking requires judgment that takes time to develop.
- Will see examples and practice applying statistical thinking throughout the course.
Developing our statistical thinking skills will allow us to soundly extract knowledge from data!

What are/is Data?

“‘Raw data’ is an oxymoron.” – Lisa Gitelman

“Data … is information made tractable.” – Catherine D’Ignazio and Lauren Klein

Data Frames

Data in spreadsheet-like format where:

Rows = Observations/cases
Columns = Variables

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Data from GPT Detectors Are Biased Against Non-Native English Writers. Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou. CellPress Patterns and available in the R package detectors.

Data Frames

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Rows = Observations/cases

What are the cases? What does each row represent?

Data Frames

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Columns = Variables

Variables: Describe characteristics of the observations

Quantitative: Numerical in nature
Categorical: Values are categories
Identification: Uniquely identify each case

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Every time you get a new dataset, spend time exploring the variables.

Example questions:

Is each variable capturing the desired information?
For categorical variables, what are the categories? Do those categories adequately represent that variable?
For quantitative variables, what values are possible? Were the data rounded or binned? Are those values actually encoding categories? What are the units of measurement?

Next time

Problem 1 of Lab 1: Create your own Dear Data postcard!

Step 1: Collect data on some aspect of your life.
Step 2: Find a story in your data and determine your postcard recipient.
Step 3: Figure out how you want to visualize the story.
Step 4+: Visualize your data on a blank postcard.

Bring a laptop to lab tomorrow!