Mathematics forms the core of data science and machine learning. Thus, to be the best data scientist you can be, you must have a working understanding of the most relevant math.
Mathematics forms the core of data science and machine learning. Thus, to be the best data scientist you can be, you must have a working understanding of the most relevant math.
Getting started in data science is easy thanks to high-level libraries like Scikit-learn and Keras. But understanding the math behind the algorithms in these libraries opens an infinite number of possibilities up to you. From identifying modeling issues to inventing new and more powerful solutions, understanding the math behind it all can dramatically increase the impact you can make over the course of your career.
Led by deep learning guru Dr. Jon Krohn, this course provides a firm grasp of the mathematics — namely linear algebra and calculus — that underlies machine learning algorithms and data science models.
Course Sections
Linear Algebra Data Structures
Tensor Operations
Matrix Properties
Eigenvectors and Eigenvalues
Matrix Operations for Machine Learning
Limits
Derivatives and Differentiation
Automatic Differentiation
Partial-Derivative Calculus
Integral Calculus
Throughout each of the sections, you'll find plenty of hands-on assignments, Python code demos, and practical exercises to get your math game in top form.
This Mathematical Foundations of Machine Learning course is complete, but in the future, we intend on adding extra content from related subjects beyond math, namely: probability, statistics, data structures, algorithms, and optimization. Enrollment now includes free, unlimited access to all of this future course content — over 25 hours in total.
Are you ready to become an outstanding data scientist? See you in the classroom.
This is a warm welcome to the Mathematical Foundations of Machine Learning series of interactive video tutorials. It provides an overview of the Linear Algebra, Calculus, Probability, Stats, and Computer Science that we'll cover in the series and that together make a complete machine learning practitioner.
In this first video of my Mathematical Foundations of Machine Learning series, I introduce the basics of Linear Algebra and how Linear Algebra relates to Machine Learning, as well as providing a brief lesson on the origins and applications of modern algebra.
In this video, we recap the sheriff and robber exercise from the preceding video, now viewing the calculations graphically using an interactive code demo in Python.
This video provides an applied linear algebra exercise (involving solar panels) to challenge your understanding of the content from the preceding video.
In this video I describe tensors, the fundamental building block of linear algebra for any kind of machine learning.
This is the first video in the course that makes heavy use of hands-on code demos. As described in the video, the default approach we assume for executing this code is within Jupyter notebooks within the (free!) Google Colab environment.
Pro tip: To prevent abuse of Colab (for, say, bitcoin mining), Colab sessions time out after a period of inactivity -- typically about 30 to 60 minutes. If your session times out, you'll lose all of the variables you had in memory, but you can quickly get back on track by following these three steps:
Click on the code cell you'd like to execute next.
Select "Runtime" from the Colab menubar near the top of your screen.
Select the "Run before" option. This executes all of the preceding cells and then you're good to go!
This video addresses the theory and notation of 1-dimensional tensors, also known as vector tensors. In addition, we’ll do some hands-on code exercises to create and transpose vector tensors in NumPy, TensorFlow and PyTorch, the leading Python libraries for working with tensors.
This video builds on the preceding one by explaining how vectors can represent a particular magnitude and direction through space. In addition, I’ll introduce norms, which are functions that quantify vector magnitude, and unit vectors. We’ll also do some hands-on exercises to code some common norms in machine learning, including L2 Norm, L1 Norm, Squared L2 Norm, and others.
This quick video addresses special types of vectors (basis, orthogonal, and orthonormal), which are critical for machine learning applications. We’ll also do a hands-on code exercise to mathematically demonstrate orthogonal vectors in NumPy.
This video covers 2-dimensional tensors, also known as matrices (or matrixes). We’ll cover matrix notation, and do a hands-on code demo on calculating matrices in NumPy, TensorFlow, and PyTorch.
In this video, we generalize tensor notation to tensors with any number of dimensions, including the high-dimensional tensors common to machine learning models. We also jump into a hands-on code demo to create 4-dimensional tensors in and PyTorch and TensorFlow.
In this video, I present three questions to test your comprehension of the Linear Algebra concepts introduced in the preceding handful of videos.
This video introduces the second section, which is on Tensor Operations.
This video introduces the theory of tensor transposition, and we carry out hands-on demos of transposition in NumPy, TensorFlow, and PyTorch.
This video demonstrates basic tensor arithmetic (including the Hadamard product) through hands-on code demos in NumPy, TensorFlow, and PyTorch.
In this video, we perform hands-on code demos in NumPy, TensorFlow, and PyTorch in order to learn about reduction, a common tensor operation in ML.
This video covers the dot product, one of the most common tensor operations in machine learning, particularly deep learning. We’ll carry out hands-on code demos in NumPy, TensorFlow, and PyTorch to see the dot product in action.
This video provides three exercises to test your comprehension of the preceding videos on basic tensor operations.
In this video, we use substitution to solve systems of linear equations on paper.
In this video, we use elimination to solve systems of linear equations on paper.
This video demonstrates how to visualize the systems of linear equations we solved in the preceding videos (on substitution and elimination). This video features hands-on code demos in Python that provide a crisp, geometric visualization of the lines in each system as well as the points that we solve for when we solve a system of linear equations.
We are now moving on to Matrix Properties, the third section of the course. Congratulations on making it here! In this section, we’ll be covering matrix properties that are vital to machine learning, including the Frobenius norm, matrix multiplication, matrix inversion and more. And of course, we’ll be doing plenty of hands-on code demos along the way.
This video explores the Frobenius norm, a function that allows us to quantify the size of a matrix. We’ll use a hands-on code demo in NumPy to solidify our understanding of the topic.
This video demonstrates matrix multiplication – the single most important and widely-used mathematical operation in machine learning. To ensure you get a solid grip on the principles of this key skill, we’ll use color diagrams, calculations by hand, interactive code demos, and an applied learning example.
This video explores symmetric matrices, a special class of matrix tensors. The most important symmetric matrix to machine learning is the identity matrix. We’ll detail it, and other symmetric matrices, including with a hands-on code demo in PyTorch.
Here are three exercises to test your comprehension of the matrix properties that we’ve learned so far.
This video introduces matrix inversion, a wildly useful transformation for machine learning. I’ll introduce the concept, and then we’ll use a series of colorful equations and hands-on code demos to solve for values in a simple regression-style problem.
While detailing how to determine the inverse of a matrix is outside the scope of this course, if you're keen to learn more on the topic, a clear tutorial can be found here: https://www.mathsisfun.com/algebra/matrix-inverse.html
This video introduces diagonal matrices, a special matrix class that is important in machine learning.
This video covers the unique properties of orthogonal matrices as well as their relevance to machine learning.
In this quick video from my Mathematical Foundations of Machine Learning series, I present a series of paper-and-pencil exercises that test your comprehension of the orthogonal matrix properties covered in the preceding video, as well as many of the other key matrix properties we covered earlier on.
Welcome to Subject 2 of the course! In this introductory video, I provide an overview of the topics covered in this subject, as well as a quick recap of the essential linear algebra topics we've covered so far -- topics you need to know to make the most of Subject 2.
In this video, we go over three matrix application exercises together. Having a firm grasp of matrix application is critical to understanding affine transformations, eigenvectors, and eigenvalues -- the topics coming up next in the series!
In this video we use hands-on code demos in NumPy to carry out affine transformations, a particular type of matrix transformation that may adjust angles or distances between vectors, but preserves parallelism. These operations can transform the target tensor in a variety of ways including scaling, shearing, or rotation. Affine transformations are also key to appreciating eigenvectors and eigenvalues, the focus of the next videos in the series.
In this video, I leverage colorful illustrations and hands-on code demos in Python to make it intuitive and easy to understand eigenvectors and eigenvalues, concepts that may otherwise be tricky to grasp.
In this video, I cover matrix determinants. A determinant is a special scalar value that we can calculate for any given matrix. It has a number of very useful properties, as well as an intimate relationship with eigenvalues that we’ll explore later on.
We’ve covered how to compute the determinant of a 2x2 matrix, but what if a matrix is larger than that? Well, that’s what this video’s for! In it, we’ll use recursion to calculate the determinant of larger matrices.
All right, we’ve covered all the theory you need to calculate 2x2 determinants or larger determinants by hand. In this video, I have three exercises to test your comprehension of that theory.
This video illustrates the relationship between determinants and eigenvalues, using hands-on code demos in Python to give you an intuitive, working understanding of what’s going on.
In this video we use hands-on code demos in Python to provide you with a working understanding of the eigendecomposition of a matrix and how we make use of it in machine learning.
In this video, I provide real-world applications of eigenvectors and eigenvalues, with special mention of applications that are directly relevant to machine learning.
Welcome to the final section of videos on linear algebra! In these videos, we cover the last key pieces of essential linear algebra you need to know to understand machine learning algorithms, including Singular Value Composition, Moore-Penrose Pseudoinversion, the Trace Operator, and Principal Component Analysis.
With a focus on hands-on code demos in Python, in this video I introduce the theory and practice of singular value decomposition, a common linear algebra operation in the field of machine learning.
In this video, we take advantage of the singular value decomposition theory that we covered in the preceding video to dramatically compress data within a hands-on Python demo.
This video introduces Moore-Penrose pseudoinversion, a linear algebra concept that enables us to invert non-square matrices. The pseudoinverse is a critical machine learning concept because it solves for unknown variables within the non-square systems of equations that are common in machine learning. To show you how it works, we’ll use a hands-on code demo.
This is one of my favorite videos in the entire course! In it, we use Moore-Penrose pseudoinversion to solve for unknowns, enabling us to fit a line to points with linear algebra alone. When I first learned how to do this, it blew my mind -- I hope it blows your mind too!
This is a quick video on the Trace Operator, a relatively simple linear algebra concept, but one that frequently comes in handy for rearranging linear algebra equations, including ones that are common in machine learning.
Via highly visual hands-on code demos in Python, this video introduces Principal Component Analysis, a prevalent and powerful machine learning technique for finding patterns in unlabeled data.
Welcome to the final linear algebra video of the course! It’s a quick one to leave you with my favorite linear algebra resources so that you can dig deeper into the topics that pique your interest the most, if desired.
In the third subject of the course, we’ll use differentiation, including powerful automatic differentiation algorithms, to learn how to optimize learning algorithms. We’ll start with an introduction on what calculus is and learn what limits are in order to understand differentiation from first principles, primarily through the use of hands-on code demos in Python.
This video uses colorful visual analogies to introduces what differential calculus at a high level.
This video is a quick high-level intro to integral calculus.
This video introduces a centuries-old calculus technique called the Method of Exhaustion, which not only provides us with a richer understanding of how modern calculus works, but is still relevant today.
In this video, we use a hands-on code demo in Python to deeply understand how approaching a curve infinitely closely enables us to determine the slope of the curve.
In this video, I provide specific examples of how calculus is applied in the real world, with an emphasis on applications to machine learning.
This video is a big one, but have no fear! It has lots of interactive code demos in Python and opportunities to work through paper-and-pencil exercises to ensure that learning about the critical subject of limits is not only interesting but also fun.
Feel like you’ve got a good handle on how to calculate limits? Let’s make sure with a handful of comprehension exercises.
In this section of Calculus videos, we use a combination of color-coded equations, paper-and-pencil exercises, and hands-on Python code demos to deeply understand how differentiation allows us to find derivatives.
In this video, we use a hands-on code demo in Python to develop a deep understanding of the Delta Method, a centuries-old differential calculus technique that enables us to determine the slope of a curve.
This video picks up right where we left off, working out the solution to the exercise I left you with at the end of the preceding video, "The Delta Method". As we work through the solution, we’ll derive, from first principles, the most common representation of the equation of differentiation! This is a fun one in which we use hands-on code demos in Python to deeply understand how we can determine the slope of any curve.
In this quick video, we cover all of the most common notation for derivatives.
The next several videos will provide you with clear and colorful examples of all of the most important differentiation rules, including all of the rules that are directly relevant to machine learning such as how to find the derivative of cost functions — something we’ll tackle later in the course as an important part of the Calculus II subject. For now, we’ll kick the derivative rules off with a rule about constants.
This quick video covers the Power Rule, one of the most common and important differentiation rules.
Today’s video covers the Constant Multiple Rule. The Constant Multiple Rule is often used in conjunction with the Power Rule, which was covered in the preceding video.
This video covers the Sum Rule, a critical rule for differentiation.
Feeling comfortable with the derivative rules we’ve covered so far:
1. The derivative of a constant
2. The power rule
3. The constant multiple rule
4. And the sum rule?
Let’s test your understanding of them with five fun exercises that bring all of the rules together.
In this video I describe the product rule, which allows us to compute the derivative of two variables separately. The product rule can be tremendously useful in simplifying complex derivations, and when the product of the two variables is incalculable before differentiation.
The quotient rule is applicable in the same situations as the product rule, except it involves the division of two variables instead of multiplication.
This video introduces the chain rule, which is arguably the single most important differentiation rule for machine learning. It facilitates several of the most ubiquitous ML algorithms, such as gradient descent and backpropagation — algorithms we detail later in this video series.
Combining the more basic derivative rules from earlier in the ML Foundations series with the product rule, quotient rule, and chain rule covered most recently, we’re now set for relatively advanced exercises that will confirm your comprehension of all of the rules.
The Power Rule on a Function Chain, like it’s name suggests, merges together two other derivative rules — the Power Rule and the Chain Rule — into a single easy step.
The content we covered in the earlier Calculus sections of the course set us up perfectly for this segment, Automatic Differentiation. AutoDiff is a computational technique that allows us to move beyond calculating derivatives by hand and scale up the calculation of derivatives to the massive scales that are common in machine learning.
This video introduces what Automatic Differentiation — also known as AutoGrad, Reverse-Mode Differentiation, and Computational Differentiation — is.
In this video, we use a hands-on code demo in PyTorch to see AutoDiff in action first-hand, enabling us to compute the derivatives equations instantaneously.
In this video, we use a hands-on code demo in TensorFlow to see AutoDiff in action first-hand, enabling us to compute the derivatives equations instantaneously.
In this video, we get ourselves set up for applying Automatic Differentiation within a Machine Learning loop by first discussing how to represent an equation as a Tensor Graph and then actually creating that graph in Python code using the PyTorch library.
In preceding videos in this series, we learned all the most essential differential calculus theory needed for machine learning. In this epic video, it all comes together to enable us to perform machine learning from first principles and fit a line to data points. To make learning interactive and intuitive, this video focuses on hands-on code demos featuring the Python library PyTorch.
This video provides a preview of the content that is covered in this subject, which is focused on Partial Derivatives (Multi-Variable Calculus) and Integration. It also reviews the Single-Variable Calculus you need to be familiar with (from the preceding subject in the ML Foundation series) in order to understand Partial Derivatives.
This video is a complete introduction to partial derivatives. To make comprehension of what can be a tricky subject as easy as possible, we use a highly visual combination of colorful paper-and-pencil examples, hands-on code demos in Python, and an interactive click-and-point curve-plotting tool.
The preceding video in this series was a thorough introduction to what partial derivatives are. The exercises in this video enable you to test your comprehension of that material.
In this video, we use the Python library PyTorch to compute partial derivatives via automatic differentiation.
Using paper and pencil as well as hands-on Python code to work through geometric examples, this video builds on the preceding ones in this series to deepen and advance our understanding of partial derivatives.
This video features three fun, geometrical examples for you to work through in order to strengthen and test your command of the partial derivative theory that we covered in the preceding videos.
This is a quick video on the common options for partial derivative notation.
In this video, I assume that you already are familiar with the chain rule for full derivatives — (as is covered, for example, in my video titled "The Chain Rule" earlier in this series). Here, we’ll extend that chain rule theory to the partial derivatives of multivariate equations.
This quick video features three exercises that test your comprehension of the chain rule when applied to multivariate functions.
In this video, I introduce the mathematically simplest machine learning model I could think of: a regression line that we fit to data points one by one, single point by single point. This simple model will enable us, in the next video, to derive the simplest-possible partial derivatives for calculating a machine learning gradient. The Machine Learning pieces really start coming together now — let’s dig right into it!
In this video, we derive by hand the partial derivatives of quadratic cost with respect to the parameters of a simple single-point regression model. This derivation is essential to understanding how machines learn via gradient descent.
In the preceding videos in this series, we detailed exactly what the gradient of cost is. With that understanding, in this video we dig into what it means to *descend* this gradient and fit a machine learning model.
In this video, we first derive by hand the gradient of mean squared error (a popular cost function in machine learning, e.g., for stochastic gradient descent. Secondly, we use the Python library PyTorch to confirm that our manual derivations correspond to those calculated with *automatic* differentiation. Thirdly and finally, we use PyTorch to visualize gradient descent in action over rounds of training.
This video explains the relationship between partial derivatives and the backpropagation approach used widely in training artificial neural networks, including deep learning networks.
This video introduces higher-order derivatives for multi-variable functions, with a particular focus on the second-order partial derivatives that abound in machine learning.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.