This class is a modern introduction to regression analysis. We will cover linear regression and other widely used methods for fitting curves to data and the causal and statistical concepts we need to make meaningful and defensible claims based on them. We'll spend roughly equal time talking about mathematical concepts and practicing data analysis using the R programming language. Together, these will provide a foundation for future study in both methodology and substantive areas.
You'll need to differentiate multivariable functions, do some matrix arithmetic, think about orthogonality, interpret and calculate conditional and unconditional expected values, work with normal and asymptotically normal random variables, and write a little R code. If you've taken Linear Algebra, Multivariable Calculus, and QTM 150 and QTM 210 or similar classes, you should have all the background you need.
Class will meet in White Hall 205 from 2:30-3:45 Mondays and Wednesdays and from 2:30-3:20 on Fridays. I will hold office hours weekly at a time to be determined.
We will read from Introduction to Statistical Learning (ISL) by James, Witten, Hastie, and Tibshirani in the first half of the semester and from Foundations of Agnostic Statistics (FAS) by Aronow and Miller in the second. These books are useful references, but what we will emphasize is fairly different, so I will be fairly selective about what I assign. Nonetheless, some content we will not discuss in class will be included for the sake of continuity. You will not be tested on it. Exam content will be drawn from lectures, labs, and homework.
Week 1 | ||
Homework | Working with matrices, vectors, and functions. | |
W Aug 24 | Lecture | Why we fit curves and what we do with our fits. |
F Aug 26 | Discussion | Areas we're interested in for the final project. |
Week 2 | ||
Reading | ISL 3.1.1 | |
M Aug 29 | Lecture | Least squares curve fits and predictions based on them. Properties of residuals. Curve shapes. |
W Aug 31 | Lab | Fitting curves, making predictions, and summarizing them. |
F Sept 1 | Discussion | Specific questions and data availability. |
Week 3 | ||
Reading | ISL 7.1-7.4 | |
Homework | Evaluating fit. | |
M Sept 5 | No class | Labor Day. |
W Sept 7 | Lecture | More curve shapes. Problems with polynomial fits. Splines. Transformed outcomes. |
F Sept 9 | Lab | Fitting curves better. |
Week 4 | ||
Reading | ISL 7.6 | |
M Sept 12 | Lecture | Matching data to questions. Overlap. Weighted least squares and residuals. |
W Sept 14 | Lab | Using weighted least squares to target specific questions. |
F Sept 16 | Discussion | Data relevance. |
Week 5 | ||
Reading | ISL 3.2 and 3.3. Skip 3.2.2. | |
Homework | Using real data. | |
M Sept 19 | Lecture | Multidimensional curve fitting. Main effects and interactions. Additive vs. isotropic models. |
W Sept 21 | Lab | Working with temporal and spatial data. |
F Sept 23 | Group Work | Fitting curves to answer your questions. |
Week 6 | ||
M Sept 26 | Lecture | Sample splitting and least squares model selection. |
W Sept 28 | Lab | Model selection. |
F Sept 30 | Group Work | Testing your approach using simulated data. |
Week 7 | ||
M Oct 3 | Review | |
W Oct 5 | Midterm Exam | |
F Oct 7 | Exam Solution | |
Week 8 | ||
Reading | FAS up to 2.2.3. | |
Homework | Working with random variables and random vectors. | |
M Oct 10 | No Class | Fall Break. |
W Oct 12 | Lecture | Probability. Expectations, conditioning, sampling, and gaussianity. |
F Oct 14 | Lab | Visualizing probability distributions. |
Week 9 | ||
Reading | FAS 2.2.4 and 3.3. | |
M Oct 17 | Lecture | Sampling from populations. The infinite population approximation. Statistical and modeling error. |
W Oct 19 | Lab | Regression in populations. Comparing true, pseudo-true, and estimated curves. |
F Oct 21 | Group Work | Thinking through the potential impacts of misspecification. |
Week 10 | ||
Reading | FAS 7.1.1, 7.1.3-7.1.6, 7.2.2, and 7.2.6. | |
Homework | The classical perspective. | |
M Oct 24 | Lecture | Causal inference. Potential outcomes, identification, and inverse probability weighting. |
W Oct 26 | Lab | Inverse probability weighted least squares. |
F Oct 28 | Group Work | Formulating and answering causal questions. |
Week 11 | ||
Reading | Continue last week's. | |
M Nov 31 | Lecture | Conditionally randomized experiments vs. observational studies. The linear probability model. |
W Nov 2 | Lab | Using estimated inverse probability weights. |
F Nov 4 | Group Work | Thinking through problems caused by confounding. |
Week 12 | ||
Reading | FAS 3.4.1-3.4.2, 3.4.4, and 4. Feel free to skip 4.3.6. | |
Homework | Hypothesis testing. | |
M Nov 7 | Lecture | Least squares with gaussian errors as an approximation. The delta method. |
W Nov 9 | Lab | Confidence intervals and coverage. |
F Nov 11 | Group Work | Making statistical claims. |
Week 13 | ||
Reading | Continue last week's. | |
M Nov 14 | Lecture | Lecture. Least squares asymptotics. |
W Nov 16 | Lab | Lab. Accuracy of asymptotic approximation: coverage with gaussian errors vs. without. |
F Nov 18 | Group Work | Reporting statistical claims. |
Week 14 | ||
M Nov 21 | Review | |
W Nov 23 | No Class | Thanksgiving. |
F Nov 25 | No Class | Thanksgiving. |
Week 15 | ||
Reading | None. | |
M Nov 28 | Lecture | Logit and log-linear models. Nonlinear least squares asymptotics. |
W Nov 30 | Lab | Comparing nonlinear regression to linear regression on transformed outcomes. |
F Dec 2 | Group Work | Appraising statistical claims. We'll have traded reports from our reporting exercise. |
Week 16 | ||
M Dec 5 | Final Exam | |
Exam Slot | ||
TBD | Project Presentations |
Problem sets will be assigned Monday roughly every other week. They are due by midnight on the Monday two weeks later. Collaboration is encouraged. I prefer that each student write and turn in solutions in their own words, and think that it is often best that this writing is done separately, with collaboration limited to discussion of problems, sketching solutions on a whiteboard, etc. I will post solutions to homework problems promptly. Review them. So that the solutions aren't delayed, I won't grant extensions on homework.
There will also be a final project: a data analysis project that should answer a substantive question. You'll work in small groups. Early in the semester, we'll use most of our Friday meetings to come up with a few good topics to choose from and form groups to work on them. Later, we'll use Fridays to work on that question in those groups. Each group will be expected to turn in a report and give a brief talk during the final exam slot.
Final grades will be based on the midterm and final exams (30% each), final project (30%), and completion of the homework and labs (10%).