When we teach statistics, we often act as if simple models were right. For example, we might find the line that best fits the data, then act as if that line were a useful description of reality. With the right blend of intuition and luck, this can work --- we can get by with a model that is wrong but not too wrong --- but it's often helpful to have a more nuanced perspective. This class is an introduction to the theory and practice of doing statistics without simplistic modeling assumptions. In short, we will talk about how to answer questions about the world, like whether a treatment helps patients or a policy has the intended effect, by fitting flexible models to data. Along the way, we'll get in some practice using and generalizing some mathematical tools you've probably seen in calculus, linear algebra, and probability classes.
In the beginning of the semester, we'll talk about a few nonparametric regression methods and the criteria we use to evaluate nonparametric regression methods generally. Then we'll shift our focus to theoretical tools that can help us understand how they behave. For the majority of the semester, we'll focus on regression with one covariate. This is atypical in modern data analysis, but it makes things easier to understand, as it means we can visualize our data and the curves we're fitting to it easily. But we'll use concepts relevant to higher-dimensional problems, and the understanding we develop this way will pay off late in the semester, when we'll find ourselves prepared for a sophisticated discussion of the essential challenge of working with big data: the curse of dimensionality.
We will cover a set supervised machine learning tasks including regression, classification, and model selection, using ℓ1-regularized models, shape-constrained models, kernel methods, and more. And we will talk briefly about the modern way to use them to answer questions. If you have heard of augmented inverse propensity weighting (AIPW) or double machine learning (DML), that's what I'm talking about. But the class is not meant to be a broad survey of methods. Instead, our focus will be on mathematical concepts that help us understand what we can and can't trust these methods (and others) to do. Translating this into practice, we'll discuss what we should try to estimate, how we should do it, and what to expect when we do. Causal inference applications will be emphasized. Drawing exercises, computer visualization, and computer simulations will be used to get a feel for the material.
By the end of this course, students should be able to write code that fits curves to data via least-squares with shape-constrained and smooth regression models, predict the rate of convergence of least squares curve fits in general terms using localized gaussian width and in specific models using fourier analysis or chaining; and use this knowledge to make appropriate modeling choices themselves and evaluate those of others. While the terminology used in the machine learning literature is varied enough that I do not expect the class to be sufficient preparation to read recent papers on their own, their familiarity with the core concepts should be sufficient for them to understand the essential ideas if translated into appropriate terms. That is, students should be able to communicate with experts about what is going on in the field.
I do not expect students to understand everything we discuss perfectly. That is not how people learn this stuff. We arrive somewhere at the end of a class, and refine our understanding as we use it talking, reading, writing, coding, etc. And we're never done: professors make incorrect claims about this stuff and get corrected, sometimes during presentations on their own research, all the time. I encourage you to check this class out if you want to develop your understanding of the core ideas of modern machine learning and statistics, even if you're unsure of what you'll be able to develop it into by the semester's end.
You'll need a working understanding of the basic concepts of probability, linear algebra, and calculus. For probability, you'll have to get comfortable talking about events and their probabilities and random variables and their expected values and variances; independence of events and random variables; and the multivariate normal distribution. For linear algebra, likewise for talking about a basis, inner product, orthogonality, eigenvalues, and eigenvectors. And for calculus, we'll work with partial derivatives and calculate an easy integral here and there.
I'll review most of this as it comes up, so if there are a few gaps or some unfamiliar terminology, it won't be a big deal. If you'd like to do some reading to prepare in advance, take a look at Chapters 1-4 of Larry Wasserman's All of Statistics, a paper copy of which is available from the library. It covers more than you will need on probability. Chapter 6 and Section 8.1 of Nicholson's Linear Algebra with Applications covers enough linear algebra. If you've forgotten the details, or never learned them, that's fine: you will not need to calculate a difficult integral, diagonalize a matrix, or know what the Poisson distribution is.
Class will meet on Mondays and Wednesdays from 4:00-5:15 in PAIS 235. I will hold office hours weekly at a time to be determined.
Aside from lecture slides and the solutions to in-class and homework exercises, you won't need to do any reading to follow what's going on in class. If you like books, Vershynin's High Dimensional Probability is a great read and covers many of the theoretical concepts we'll be talking about. It does, however, aim at a more mathematically experienced audience than class is meant to. This is a good book to look at if you've enjoyed the class, feel confident about the material we've covered, and are looking for more breadth or depth.
Short problem sets will be assigned almost every week as homework. Turning them isn't mandatory, but they are part of the exposition, so you'll need to know what's in them. I recommend that you look them over and work through any problems that you think you'd learn from, and if you do want to turn them in your work, I'm happy to give you feedback. And please look over the solutions. Collaboration on homework is encouraged.
There will not be a one-size-fits-all approach to grading in this class. It is meant to be an opportunity for students to develop their understanding from where it is toward where they want it to be. There will be some assignments appropriate for your learning goals. This may mean you'll turn in some of the problem sets or complete some kind of project. We’ll meet periodically to discuss your learning goals and the progress you're making toward them. You'll be expected to write a short reflection those weeks. We'll talk through what grade a given level of progress toward your goals warrants, so the grade you ultimately receive shouldn’t be a surprise. These grades will be based on class participation (30%), reflections (30%), and assignments (40%).
Week 1 | ||
W Jan 17 | Lecture | Intro |
Homework | Intro to Convex Programming using CVXR | |
Week 2 | ||
M Jan 22 | Lab | Implementing Monotone Regression, Day 1 |
W Jan 24 | Lab | Implementing Monotone Regression, Day 2 |
Homework | Vector Spaces | |
Week 3 | ||
M Jan 29 | Lecture | Bounded Variation Regression |
W Jan 31 | Lab | Implementing Bounded Variation Regression |
Homework | Lipschitz Regression | |
Week 4 | ||
M Feb 5 | Lab | Rates and Modes of Convergence |
W Feb 7 | Lecture | Treatment Effects and the R-Learner |
Homework | Convexity and Convex Regression | |
Week 5 | ||
M Feb 12 | Lab | The Parametric R-Learner |
W Feb 14 | Lab | The Nonparametric R-Learner |
Homework | Reflection | |
Week 6 | ||
M Feb 21 | Discussion | Review |
W Feb 19 | Lecture | Least Squares in Finite Models, i.e., Model Selection |
Homework | Subgaussianity and Maximal Inequalities | |
Week 7 | ||
M Feb 26 | Lab | Understanding Model Selection |
W Feb 28 | Lecture | Least Squares in Infinite Models, i.e., Regression |
Homework | The Efron-Stein Inequality | |
Week 8 | ||
M Mar 4 | Lab | Understanding Gaussian Width. We'll draw. |
W Mar 6 | Lecture | Least Squares and non-Gaussian Noise |
Homework | The Gaussian Width of Simple Models | |
Week 9 | ||
M Mar 11 | No Class | Spring Break |
W Mar 13 | No Class | Spring Break |
Week 10 | ||
M Mar 18 | Lecture | Least Squares and Misspecification |
W Mar 20 | Lecture | Least Squares and Population MSE |
Homework | Reflection | |
Week 11 | ||
M Mar 25 | Discussion | Review |
W Mar 27 | Lab | The Discrete Sobolev Model |
Homework | The Periodic Discrete Sobolev Model | |
Week 12 | ||
M Apr 1 | Lecture | The Periodic Sobolev Model, Fourier Series, and Gaussian Width |
W Apr 3 | Lab | Implementing Sobolev Regression using Fourier Series Approximations |
Homework | Interpreting Polynomial Regression using Sieves | |
Week 13 | ||
M Apr 8 | Lecture | Multivariate Sobolev Models and the Curse of Dimensionality |
W Apr 10 | Lab | Comparing Multivariate Sobolev Models |
Homework | Image Denoising | |
Week 14 | ||
M Apr 15 | Lecture | Bounding Gaussian Width using Covering Numbers |
W Apr 17 | Lecture | Bounding Gaussian Width via Chaining |
Homework | Covering Numbers for Monotone and BV Regression Models | |
Week 15 | ||
M Apr 15 | TBD | |
W Apr 17 | TBD | |
Homework | Reflection | |
Week 16 | ||
M Apr 29 | Review |