When we teach statistics, we often act as if simple models were right. For example, we might find the line that best fits the data, then act as if that line were a useful description of reality. With the right blend of intuition and luck, this can work --- we can get by with a model that is wrong but not too wrong --- but it's often helpful to have a more nuanced perspective. This class is an introduction to the theory and practice of doing statistics without simplistic modeling assumptions. In short, we will talk about how to answer questions about the world, like whether a treatment helps patients or a policy has the intended effect, by fitting flexible models to data.
We will cover a set supervised machine learning tasks including regression, classification, model selection, and model stacking using ℓ1-regularized models, shape-constrained models, and kernel machines, and more. And we will talk briefly about the modern way to use them to answer questions. If you have heard of augmented inverse propensity weighting (AIPW) or double machine learning (DML), that's what I'm talking about. But the class is not meant to be a broad survey of methods. Instead, our focus will be on mathematical concepts that help us understand what we can and can't trust these methods (and others) to do. Translating this into practice, we'll discuss what we should try to estimate, how we should do it, and what to expect when we do. Causal inference applications will be emphasized. Drawing exercises, computer visualization, and computer simulations will be used to get a feel for the material.
You'll need a working understanding of the basic concepts of probability, linear algebra, and calculus. For probability, you'll have to get comfortable talking about events and their probabilities and random variables and their expected values and variances; independence of events and random variables; and the multivariate normal distribution. For linear algebra, likewise for talking about a basis, inner product, and orthogonality. And for calculus, Taylor approximation. We'll use complex numbers a bit, too.
If you've taken QTM 220 and its prerequisites, you should be well-prepared. I'll review most of this as it comes up, so if there are a few gaps or some unfamiliar terminology, it won't be a big deal. If you'd like to do some reading to prepare in advance, take a look at Chapters 1-4 of Larry Wasserman's All of Statistics, a paper copy of which is available from the library. It covers more than you will need on probability. Chapter 6 and Section 8.1 of Nicholson's Linear Algebra with Applications covers enough linear algebra. If you've forgotten the details, or never learned them, that's fine: you will not need to calculate an integral, diagonalize a matrix, or know what the Poisson distribution is.
Homework 0 reviews much of what we'll need. It'll be due a few weeks into the semester, but you may want to get it out of the way early.
Class will meet on Mondays and Wednesdays from 4:00-5:15 in PAIS 250. I will hold office hours weekly at a time to be determined.
Aside from lecture slides and the solutions to in-class and homework exercises, you won't need to do any reading to follow what's going on in class. I'll list reading on the class schedule that'll refine and generalize that stuff. In particular, it'll cover the estimation of 'Riesz representers' and their role in double-robust estimation of all kinds of things.
If you like books, Vershynin's High Dimensional Probability is a great read and covers many of the theoretical concepts we'll be talking about. I'll specify some sections near the end of the class.
We'll cover these roughly in order. A more precise schedule is given below.
Short problem sets will be assigned every week or two as homework. Collaboration is encouraged. I prefer that each student write and turn in solutions in their own words, and think that it is often best that this writing is done separately, with collaboration limited to discussion of problems, sketching solutions on a whiteboard, etc. This will help you and I understand where you're at in terms of your proficiency with the material. These are not a test. I will work on them with you during my office hours if you want. That said, it's not necessary to come to hours if there's a problem you can't get. I encourage you to try all the problems, but it's fine to omit a problem or two from what you turn in. I'll post complete solutions soon after each assignment is due. Review them. So that the solutions aren't delayed, I won't grant extensions on homework.
There will also be a final project. You'll work in groups. I ask that, drawing on the concepts you've learned in class, you learn about a method or concept we haven't covered. Then, using what we have covered as a background, teach it to the rest of us. I think the topics listed below are good choices, but don't let that limit you. The papers on this stuff are often pretty hard to understand, but that's about the technical language they use more than the concepts involved. I'll help you translate it. Your job, for the most part, will be to understand the main ideas and work out how to teach them effectively. Feel free to do that by giving a lecture, writing a short paper and doing Q & A, running an in-class exercise, or a mix of these. To allow time for the final project, problem sets will be shorter toward the end of the semester.
Final grades will be based on completion of the homework and labs and the quality of the final project, with equal weight given to the two. I do not expect students to understand everything we discuss perfectly. That is not how people learn this stuff. We arrive somewhere at the end of a class, and refine our understanding as we use it talking, reading, writing, coding, etc. And we're never done: professors make incorrect claims about this stuff and get corrected, sometimes during presentations on their own research, all the time. I encourage you to check this class out if you want to develop your understanding of the core ideas of modern machine learning and statistics, even if you're unsure of what you'll be able to develop it into by the semester's end.
Week 1 | ||
Homework | Homework 0: Background. | |
W Aug 24 | Lecture | Intro |
Week 2 | ||
M Aug 29 | Lab | Intro to CVXR and Monotone Regression. Application to RDD. |
W Aug 31 | Lab | Squared Error and Rates of Convergence. |
Week 3 | ||
Homework | Homework 1: Lipschitz Regression | |
M Sept 5 | No class | Labor Day. |
W Sept 7 | Lecture | Bounded Variation Regression. |
Week 4 | ||
M Sept 12 | Lab | Bounded Variation Regression; Training and Test Errors. |
W Sept 14 | Lab | Regression Discontinuity Design Lab |
Week 5 | ||
M Sept 19 | Lecture | Monotone and Bounded Variation Regression in dimension > 1. |
W Sept 21 | Lab | Monotone and Bounded Variation Regression in 2D. |
Week 6 | ||
Homework | Homework 2: Sobolev Spaces | |
M Sept 26 | Lecture | Sobolev Spaces. |
W Sept 28 | Lab | Sobolev Spaces and the Fourier Basis. The bias/variance trade-off. |
Week 7 | ||
M Oct 3 | Lecture+Lab | Sieves. |
W Oct 6 | Lecture | Multidimensional Sobolev Spaces and the Kernel Trick. |
Week 8 | ||
M Oct 10 | No Class | Fall Break. |
W Oct 12 | Lab | Isotropic and Anisotropic Sobolev Models. |
Week 9 | ||
Homework | Homework 3: AIPW | |
Reading | The Balancing Act in Causal Inference | |
M Oct 17 | Lecture | Augmented Inverse Probability Weighting. |
W Oct 19 | Lab | Adjusting for covariates using AIPW. We'll use spatial data. |
Week 10 | ||
Reading | Terry Tao on the Gaussian Concentration Inequality (Theorem 8) | |
M Oct 24 | Lecture | Least Squares Theory for Finite Models; Applications to Model Selection. |
W Oct 26 | Lab | Model selection. |
Week 11 | ||
Homework | Homework 4: Gaussian Width and the Curse of Dimensionality. | |
Reading | Augmented Minimax Linear Estimation | |
M Nov 31 | Lecture | Least Squares Theory for Convex Models |
W Nov 2 | Lecture | Least Squares with Misspecification; Applications to Model Aggregation. |
Week 12 | ||
Deadline | Choose a topic for the final project. | |
Reading | Chapter 8 of High Dimensional Probability | |
M Nov 7 | Lab | Fixed Points of Local Gaussian Width and Rates of Convergence. |
W Nov 9 | Lecture | Calculating Gaussian Width. |
Week 13 | ||
Reading | Sections 6.4-6.7 of High Dimensional Probability | |
M Nov 14 | Lecture | Least Squares Classification and Regression with non-Gaussian Noise. |
W Nov 16 | Lecture | Population Properties of Least Squares; Local Least Squares. |
Week 14 | ||
M Nov 21 | Lecture+Discussion | Revisiting AIPW. |
W Nov 23 | No Class | Thanksgiving. |
F Nov 25 | No Class | Thanksgiving. |
Week 15 | ||
M Nov 28 | Lecture+Discussion | Revisiting RDD. |
W Nov 30 | ? | TBD. |
Week 16 | ||
M Dec 5 | ? | TBD. |
Exam Slot | ||
TBD | Final Project Presentations. |