Week 1: Statistical Learning
This module introduces the standard theoretical framework used to analyze statistical learning problems. We start by covering the concept of regression function and the need for parametric models to estimate it due to the curse of dimensionality. We continue by presenting tools to assess the quality of a parametric model and discuss the bias-variance tradeoff as a theoretical framework to understand overfitting and optimal model flexibility.
Week 2: Linear Regression
In this module, we cover the problem of linear regression. We start with a formal statement of the problem, we derive a solution as an optimization problem, and provide a closed-form expression using the matrix pseudoinverse. We then move on to analyze the statistical properties of the linear regression coefficients, such as their covariance and variances. We use this statistical analysis to determine coefficient accuracy and analyze confidence intervals. We then move on to the topic of hypothesis testing, which we use to determine dependencies between input variables and outputs. We finalize with a collection of metrics to measure model accuracy, and continue with the introduction to the Python programming language. Please note, there is no formal assignment this week, and we hope that everyone participates in the discussion instead.
Week 3: Extended Linear Regression
Week 4: Classification
In this module, we introduce classification problems from the lens of statistical learning. We start by introducing a generative model based on the concept of conditional class probability. Using these probabilities, we show how to build the Bayes optimal classifier which minimizes the expected misclassification error. We then move on to present logistic regression, in conjunction with maximum likelihood estimation, for parametric estimation of the conditional class probabilities from data. We also extend the idea of hypothesis testing to the context of logistic regression.