Machine Learning and Nonparametric Estimation: Fall 2022

An introduction to statistical learning theory and doing data analysis informed by it.
QTM 385 Section 7 with David A. Hirshberg

Description

When we teach statistics, we often act as if simple models were right. For example, we might find the line that best fits the data, then act as if that line were a useful description of reality. With the right blend of intuition and luck, this can work --- we can get by with a model that is wrong but not too wrong --- but it's often helpful to have a more nuanced perspective. This class is an introduction to the theory and practice of doing statistics without simplistic modeling assumptions. In short, we will talk about how to answer questions about the world, like whether a treatment helps patients or a policy has the intended effect, by fitting flexible models to data.

We will cover a set supervised machine learning tasks including regression, classification, model selection, and model stacking using ℓ1-regularized models, shape-constrained models, and kernel machines, and more. And we will talk briefly about the modern way to use them to answer questions. If you have heard of augmented inverse propensity weighting (AIPW) or double machine learning (DML), that's what I'm talking about. But the class is not meant to be a broad survey of methods. Instead, our focus will be on mathematical concepts that help us understand what we can and can't trust these methods (and others) to do. Translating this into practice, we'll discuss what we should try to estimate, how we should do it, and what to expect when we do. Causal inference applications will be emphasized. Drawing exercises, computer visualization, and computer simulations will be used to get a feel for the material.

Background Knowledge

You'll need a working understanding of the basic concepts of probability, linear algebra, and calculus. For probability, you'll have to get comfortable talking about events and their probabilities and random variables and their expected values and variances; independence of events and random variables; and the multivariate normal distribution. For linear algebra, likewise for talking about a basis, inner product, and orthogonality. And for calculus, Taylor approximation. We'll use complex numbers a bit, too.

If you've taken QTM 220 and its prerequisites, you should be well-prepared. I'll review most of this as it comes up, so if there are a few gaps or some unfamiliar terminology, it won't be a big deal. If you'd like to do some reading to prepare in advance, take a look at Chapters 1-4 of Larry Wasserman's All of Statistics, a paper copy of which is available from the library. It covers more than you will need on probability. Chapter 6 and Section 8.1 of Nicholson's Linear Algebra with Applications covers enough linear algebra. If you've forgotten the details, or never learned them, that's fine: you will not need to calculate an integral, diagonalize a matrix, or know what the Poisson distribution is.

Homework 0 reviews much of what we'll need. It'll be due a few weeks into the semester, but you may want to get it out of the way early.

Meeting Times

Class will meet on Mondays and Wednesdays from 4:00-5:15 in PAIS 250. I will hold office hours weekly at a time to be determined.

Readings

Aside from lecture slides and the solutions to in-class and homework exercises, you won't need to do any reading to follow what's going on in class. I'll list reading on the class schedule that'll refine and generalize that stuff. In particular, it'll cover the estimation of 'Riesz representers' and their role in double-robust estimation of all kinds of things.

If you like books, Vershynin's High Dimensional Probability is a great read and covers many of the theoretical concepts we'll be talking about. I'll specify some sections near the end of the class.

Topics

We'll cover these roughly in order. A more precise schedule is given below.

  • Introductory Material
    1. Theoretical frameworks
      • Classical theory (i.e. what we teach in QTM220), its interpretations, and its shortcomings.
      • Modern high dimensional and nonparametric theory. What it says and what it's for.
    2. What a regression model is. It's just a set of functions predicting y from x.
      • Some can be described by a few parameters. Lines, polynomials, generalized linear models, etc.
      • Others require infinitely many. Increasing curves, curves shaped like a bowl or an S, curves that don't wiggle too much, etc.
      • We can fit them all via least squares if we like. We usually do.
    3. Application: Estimating the effect of reducing class sizes using RDD.
      • We'll do a simplified version of an estimate by Angrist and Lavi using two models: lines and increasing curves.
      • This'll be our introduction to minimizing loss functions using CVXR.
    4. The way regression estimates fit and don't fit.
      • Mean squared error, pointwise squared error, and rates of convergence.
      • Training and test error.
      • The bias/variance trade-off.
  • Regression models.
    1. Curves satisfying shape constraints. Fitting increasing, bowl-shaped, s-shaped, and convex functions.
    2. Curves that either don't wiggle too much or too fast. Fitting functions with bounded variation and Lipschitz functions.
    3. Curves that don't wiggle either too much or too fast. Fitting Sobolev and RKHS models. We'll use (generalized) Fourier series and the kernel trick.
    4. Models that are intentionally a bit wrong, a.k.a. Sieves. Both a practical approach and an interpretation of what people usually do.
  • What we do with them.
    1. Estimating treatment effects using observational data using augmented inverse propensity weighting (AIPW). Overlap between subpopulations.
    2. Estimating treatment effects using natural experiments. Lack of overlap in RDD and local averaging of curves.
  • Least squares theory.
    1. Choosing from finitely many options, a.k.a. Model Selection. Bounding sample mean squared error using the gaussian tail bound and union bound.
    2. Choosing from infinitely many options, a.k.a. Regression. Doing the same using local gaussian width, symmetrization, and gaussian comparison.
      • The curse of dimensionality in linear models. ℓ1-regularization and the local gaussian width of diamonds; ℓ2 regularization and the same for circles. And ovals.
      • The curse of dimensionality in Sobolev models. Fourier series and the local gaussian width of infinite-dimensional ovals. Is six dimensions too many?
    3. Choosing the best of bad options. Misspecified models and projections. Model aggregation.
    4. Generalizing from claims about the sample to claims about the population it's drawn from and other populations. Empirical processes.

Workload

Short problem sets will be assigned every week or two as homework. Collaboration is encouraged. I prefer that each student write and turn in solutions in their own words, and think that it is often best that this writing is done separately, with collaboration limited to discussion of problems, sketching solutions on a whiteboard, etc. This will help you and I understand where you're at in terms of your proficiency with the material. These are not a test. I will work on them with you during my office hours if you want. That said, it's not necessary to come to hours if there's a problem you can't get. I encourage you to try all the problems, but it's fine to omit a problem or two from what you turn in. I'll post complete solutions soon after each assignment is due. Review them. So that the solutions aren't delayed, I won't grant extensions on homework.

There will also be a final project. You'll work in groups. I ask that, drawing on the concepts you've learned in class, you learn about a method or concept we haven't covered. Then, using what we have covered as a background, teach it to the rest of us. I think the topics listed below are good choices, but don't let that limit you. The papers on this stuff are often pretty hard to understand, but that's about the technical language they use more than the concepts involved. I'll help you translate it. Your job, for the most part, will be to understand the main ideas and work out how to teach them effectively. Feel free to do that by giving a lecture, writing a short paper and doing Q & A, running an in-class exercise, or a mix of these. To allow time for the final project, problem sets will be shorter toward the end of the semester.

Final grades will be based on completion of the homework and labs and the quality of the final project, with equal weight given to the two. I do not expect students to understand everything we discuss perfectly. That is not how people learn this stuff. We arrive somewhere at the end of a class, and refine our understanding as we use it talking, reading, writing, coding, etc. And we're never done: professors make incorrect claims about this stuff and get corrected, sometimes during presentations on their own research, all the time. I encourage you to check this class out if you want to develop your understanding of the core ideas of modern machine learning and statistics, even if you're unsure of what you'll be able to develop it into by the semester's end.

Possible Final Project Topics
  1. Lower bounds for least squares with convex models. Discussed here.
  2. Adaptive bounds for shape-constrained regression. Discussed here.
  3. High dimensional regression with p ≈ n. Discussed here, here, and in many other papers.
  4. Estimating individualized treatment effects. Maybe the R-Learner.
  5. Understanding and improving AIPW. Covariate balance, double robust inference, etc.
  6. Models defined by shape-constraints and generalizations of total variation in 2+ dimensions. Discussed here, here, and here.
  7. Classification performance and the idea of the margin. Discussed here.

Tentative Schedule

Week 1
Homework Homework 0: Background.
W Aug 24 Lecture Intro
Week 2
M Aug 29 Lab Intro to CVXR and Monotone Regression. Application to RDD.
W Aug 31 Lab Squared Error and Rates of Convergence.
Week 3
Homework Homework 1: Lipschitz Regression
M Sept 5 No class Labor Day.
W Sept 7 Lecture Bounded Variation Regression.
Week 4
M Sept 12 Lab Bounded Variation Regression; Training and Test Errors.
W Sept 14 Lab Regression Discontinuity Design Lab
Week 5
M Sept 19 Lecture Monotone and Bounded Variation Regression in dimension > 1.
W Sept 21 Lab Monotone and Bounded Variation Regression in 2D.
Week 6
Homework Homework 2: Sobolev Spaces
M Sept 26 Lecture Sobolev Spaces.
W Sept 28 Lab Sobolev Spaces and the Fourier Basis. The bias/variance trade-off.
Week 7
M Oct 3 Lecture+Lab Sieves.
W Oct 6 Lecture Multidimensional Sobolev Spaces and the Kernel Trick.
Week 8
M Oct 10 No Class Fall Break.
W Oct 12 Lab Isotropic and Anisotropic Sobolev Models.
Week 9
Homework Homework 3: AIPW
Reading The Balancing Act in Causal Inference
M Oct 17 Lecture Augmented Inverse Probability Weighting.
W Oct 19 Lab Adjusting for covariates using AIPW. We'll use spatial data.
Week 10
Reading Terry Tao on the Gaussian Concentration Inequality (Theorem 8)
M Oct 24 Lecture Least Squares Theory for Finite Models; Applications to Model Selection.
W Oct 26 Lab Model selection.
Week 11
Homework Homework 4: Gaussian Width and the Curse of Dimensionality.
Reading Augmented Minimax Linear Estimation
M Nov 31 Lecture Least Squares Theory for Convex Models
W Nov 2 Lecture Least Squares with Misspecification; Applications to Model Aggregation.
Week 12
Deadline Choose a topic for the final project.
Reading Chapter 8 of High Dimensional Probability
M Nov 7 Lab Fixed Points of Local Gaussian Width and Rates of Convergence.
W Nov 9 Lecture Calculating Gaussian Width.
Week 13
Reading Sections 6.4-6.7 of High Dimensional Probability
M Nov 14 Lecture Least Squares Classification and Regression with non-Gaussian Noise.
W Nov 16 Lecture Population Properties of Least Squares; Local Least Squares.
Week 14
M Nov 21 Lecture+Discussion Revisiting AIPW.
W Nov 23 No Class Thanksgiving.
F Nov 25 No Class Thanksgiving.
Week 15
M Nov 28 Lecture+Discussion Revisiting RDD.
W Nov 30 ? TBD.
Week 16
M Dec 5 ? TBD.
Exam Slot
TBD Final Project Presentations.

Policies

Accessibility and Accomodations
As the instructor of this course I endeavor to provide an inclusive learning environment. I want every student to succeed. The Department of Accessibility Services (DAS) works with students who have disabilities to provide reasonable accommodations. It is your responsibility to request accommodations. In order to receive consideration for reasonable accommodations, you must register with the DAS here. Accommodations cannot be retroactively applied so you need to contact DAS as early as possible and contact me as early as possible in the semester to discuss the plan for implementation of your accommodations. For additional information about accessibility and accommodations, please contact the Department of Accessibility Services at (404) 727-9877 or accessibility@emory.edu.

Attendance
Class attendance is not mandatory and will not affect your grade. Schedule conflicts and illness happen. There is no need to explain your absences or inform me of them in advance of class meetings. And please do not come to class sick. I will share my lectures via zoom and post recordings soon afterward. But please attend regularly if you can. It's hard to participate in discussions and group exercises when you don't.

Writing Center
Tutors in the Emory Writing Center and the ESL Program are available to support Emory College students as they work on any type of writing assignment, at any stage of the composing process. Tutors can assist with a range of projects, from traditional papers and presentations to websites and other multimedia projects. Writing Center and ESL tutors take a similar approach as they work with students on concerns including idea development, structure, use of sources, grammar, and word choice. They do not proofread for students. Instead, they discuss strategies and resources students can use as they write, revise, and edit their own work. Students who are non-native speakers of English are welcome to visit either Writing Center tutors or ESL tutors. All other students in the college should see Writing Center tutors. Learn more, view hours, and make appointments by visiting the websites of the ESL Program and the Writing Center. Please review the Writing Center’s tutoring policies before your visit.

Honor Council
The Honor Code is in effect throughout the semester. By taking this course, you affirm that it is a violation of the code to cheat on exams, to plagiarize, to deviate from the teacher's instructions about collaboration on work that is submitted for grades, to give false information to a faculty member, and to undertake any other form of academic misconduct. You agree that the instructor is entitled to move you to another seat during examinations, without explanation. You also affirm that if you witness others violating the code you have a duty to report them to the honor council.