Master statistics & machine learning: intuition, math, code from Udemy

What's inside

Learning objectives

Descriptive statistics (mean, variance, etc)
Inferential statistics
T-tests, correlation, anova, regression, clustering
The math behind the "black box" statistical methods

How to implement statistical methods in code
How to interpret statistics correctly and avoid common misunderstandings
Coding techniques in python and matlab/octave
Machine learning methods like clustering, predictive analysis, classification, and data cleaning

Descriptive statistics (mean, variance, etc)
Inferential statistics
T-tests, correlation, anova, regression, clustering
The math behind the "black box" statistical methods
How to implement statistical methods in code
How to interpret statistics correctly and avoid common misunderstandings
Coding techniques in python and matlab/octave
Machine learning methods like clustering, predictive analysis, classification, and data cleaning

Syllabus

Introductions

Strategies for optimal learning.

How to use different programming languages in the course.

Simulate data and run a statistical analysis. A fun way to start the course :)

I explain how to get the most out of the interactive part of this course: The Q&A forum!

(optional) Entering time-stamped notes in the Udemy video player

Math prerequisites

A discussion about memorizing formulas.

A reminder about foundational arithmetic rules.

Ways of representing very large and very small numbers.

Mathematical notation for adding a series of numbers.

Absolute value is the distance away from zero, regardless of sign.

Natural exponent and logarithm are two of the most important functions in math and its applications.

The logistic function is used often in statistics, machine learning, and optimization.

To rank data means to transform raw numerical values into ordinal position. Rank is used in non-parametric statistics.

IMPORTANT: Download course materials

Download materials for the entire course!

What are (is?) data?

My take on statistical terminology, grammar, and modern culture.

A philosophical discussion about how we can obtain numbers from the universe.

Data come in different forms, which has implications for ways of visualizing and analyzing data.

Introduction to data types in MATLAB and Python.

There is an important distinction between measuring *all* of the data vs. some of the data.

This distinction is related to sample size, and has implications for the generalizability of experimental findings.

The take-home message here is simple: Don't lie or cheat!

Visualizing data

Lecture on how to create and interpret bar plots, including the types of data that are used.

Creating bar plots in MATLAB and Python, including parameters.

Creating and interpreting box plots, also called box-and-whisker plots.

Box plots in MATLAB and Python.

An exercise on creating box plots of random numbers drawn from different distributions.

A lecture on how to create and interpret histograms, including frequency vs. proportion.

Creating and visualizing histograms in code.

An exercise on transforming frequencies (counts) into proportions.

Pie charts are nice visualizations when your data add up to 100%.

Create pie charts in code. It's easier than you think!

A critical discussion of how to visualize categorical vs. continuous data using lines vs. bars.

A comparison of scaling the y-axis and x-axis intervals.

More on plotting and parameterizing line plots in code.

An exercise on scaling data in different ways.

Descriptive statistics

The term "statistics" actually has two broad meanings: characteristics of a sample vs. generalizing to other samples.

These terms relate to how your data relate to the real world objects that the data measure.

Data come in different distributions, which has implications for how to visualize and analyze datasets.

You will learn how to create random data with different distributions in MATLAB and Python.

What happens when you plot the distribution of a distribution function? Find out!

The Gaussian distribution describes a remarkable and fundamental quality of the universe.

The mean, aka average, is the most common and insightful measure of a data set.

The mean is not appropriate for all data distributions; here you will learn two non-parametric measures of dataset centrality.

Computing mean, median, and mode in MATLAB and Python.

An exercise to help you understand the impact of outliers on mean, median, and mode.

You will learn about dispersion, which is how wide the data distribution is.

Computing different measures of dispersion in code.

IQR is a measures of the spread of most (but not all) of the data, and is robust to outliers.

See how to generate the interquartile range in code.

QQ plots show how your data compare to a theoretical normal (Gaussian) distribution.

Learn how QQ plots are created in Python and MATLAB.

Moments are statistical characteristics of the data. Here you'll learn the first four moments of a distribution.

More on histograms: Learn the formulas for determining the number of bins (data discretizations) to use.

Experiment with histogram parameters.

Learn how to create and interpret a beautiful graph for visualizing data and data distributions.

See how violin plots are created in code. Tip: Use lots of colors!

An exercise to visualize two data distributions in one violin plot.

Learn how to interpret this nonlinear measure of data dispersion.

Shannon entropy in code.

You will see how the bin-count parameter affects entropy.

Data normalizations and outliers

No amount of fancy statistics or data cleaning can fix terrible data. Start with good data!

Z-score is the most important data normalization in statistics and machine learning.

Translate the z-score formula into code.

Min-max scaling is the second-most important data normalization method.

Translate min-max scaling into Python and MATLAB code.

An exercise to get from normalized data back to their original scale.

Outliers are unusual values that can completely screw up your analyses and interpretation!

This is one of the most common methods for identifying and removing outliers.

The modified z-score method uses the median instead of the mean, and therefore is good for removing outliers in non-normal distributions.

Implement the modified z-score method in code.

Does it really matter if you use the regular or modified z-score method? Come find out!

Extend the z-score method to outliers in high-dimensional datasets.

Multivariate outlier identification and removal, using concepts from geometry.

Another common method for removing outliers, based on threshold-exceedance.

See how data trimming is implemented in MATLAB and Python.

Instead of removing outliers, you can use analyses that are robust to outliers.

Some outliers can be transformed into non-outliers by applying certain nonlinear transformations.

A lecture on one of the main challenges of online learning. Just something to reflect on.

Probability theory

Introduction to probability and the role of probability in statistics.

Probability and proportion are really similar concepts, but it's important to know their subtle difference.

Instructions on how to compute probabilities (math).

How to compute probabilities in practice (code).

Probability and odds are different concepts; see how they differ and how to interpret odds ratios.

This exercise on odds-ratios will help make sure you really understand the math of odds-ratios.

Different terms are used for probabilities, depending on the data type (categorical vs. continuous).

Compute empirical probability mass functions.

cdfs are central to evaluating statistical significance. In this video you'll learn how to create and interpret cdfs.

Here you will learn how to compute cdfs from pdfs, including a potentially confusing aspect of their relationship.

An exercise to create cdfs from various random distributions.

Learn how to create a distribution of means from repeated samples. This is key to hypothesis-testing.

You already know how to do Monte Carlo sampling; here I will make sure you know the terminology.

Sampling isn't perfect, and understanding its limitations will help you properly interpret statistical results.

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Examines probability theory and its role in statistics, which is standard in data science

Develops skills in probability, statistics, and machine learning, which are core to data science

Explores concepts in probability and statistics, which are essential for understanding data

Teaches students how to apply probability and statistics to real-world problems, which is highly relevant to data science

Uses the Statistics and Machine Learning toolbox in MATLAB, which may be beneficial to students with access to this resource

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Master statistics & machine learning: intuition, math, code with these activities:

Review Statistics Quiz

Show steps

Get your brain back into statistics mode by reviewing key concepts and formulas.

Browse courses on Statistical Techniques

Show steps

Take the quiz

StatsQuest Crash Course

Show steps

Get some video-based assistance with key concepts in statistics, machine learning, and data science. Great for those new to the subject or in need of a refresher.

Browse courses on Probability

Show steps

Choose a topic
Watch the video
Complete the quiz

Python Coding Drills

Show steps

Strengthen your Python skills and understanding of statistical algorithms by completing coding exercises.

Browse courses on Python Programming

Show steps

Read the instructions
Write the code
Test the code

Four other activities

Expand to see all activities and additional details

Show all seven activities

Introduction to Statistical Learning

Show steps

Enhance your theoretical understanding of machine learning algorithms and statistical techniques by reading this comprehensive textbook.

View An Introduction to Statistical Learning: with... on Amazon

Show steps

Read a chapter
Complete the exercises

Study Group

Show steps

Enhance your understanding and critical thinking skills by discussing the course material with a study group.

Browse courses on Collaboration

Show steps

Find a study group
Attend the meetings
Participate in the discussions

Data Analysis Project

Show steps

Apply your skills to a real-world dataset by completing a data analysis project. This will test your ability to use statistical methods and machine learning algorithms to solve a problem.

Browse courses on Data Analysis

Show steps

Define the problem
Collect the data
Analyze the data
Present the results

Tutor a Peer

Show steps

Reinforce your understanding of statistics by teaching concepts to a peer. Helping others will also improve your communication and interpersonal skills.

Browse courses on Mentoring

Show steps

Find a peer
Tutor the peer
Reflect on the experience

Career center

Learners who complete Master statistics & machine learning: intuition, math, code will develop knowledge and skills that may be useful to these careers:

Data Scientist

A data scientist combines knowledge in statistics, machine learning, and data analysis to extract valuable insights from large datasets. This course will provide you with a comprehensive understanding of the fundamental concepts of statistics, machine learning, and data science, and it covers practical aspects of data analysis, such as data normalization, outlier detection and removal, and data visualization. By taking this course, you will gain the skills and knowledge needed to succeed as a data scientist.

See salaries and explore the career path for Data Scientist

Statistician

A statistician is a professional who specializes in collecting, analyzing, and interpreting data. They use statistical methods to draw conclusions and make predictions about the world around them. This course will provide you with a comprehensive understanding of the fundamental concepts of statistics, probability theory, and hypothesis testing, and it covers practical aspects of statistical analysis, such as data normalization, outlier detection and removal, and data visualization. By taking this course, you will gain the skills and knowledge needed to succeed as a statistician.

See salaries and explore the career path for Statistician

Data Analyst

A data analyst utilizes statistical and mathematical principles to gather, analyze, and interpret data in order to provide insights to inform decision-making within an organization. This course will provide you with valuable knowledge in descriptive and inferential statistics, probability theory, and machine learning, helping you build a foundation for success as a data analyst. It also covers practical aspects of data analysis, such as data normalization, outlier detection and removal, and data visualization, making it an ideal choice for learners who want to enter this career field.

See salaries and explore the career path for Data Analyst

Machine Learning Engineer

A machine learning engineer is a professional who specializes in developing and deploying machine learning models. They work with data to identify patterns and build models that can make predictions or automate processes. This course will provide you with a solid foundation in the fundamentals of statistics, machine learning, and data analysis, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Machine Learning Engineer

Actuary

Actuaries use statistical and mathematical models to assess the financial risk and uncertainty associated with insurance and other financial products. They work with data to develop and price insurance policies, and to assess the risk of events such as natural disasters and accidents. This course will provide you with a solid foundation in the fundamentals of statistics, probability theory, and financial analysis, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Actuary

Quantitative Analyst

A quantitative analyst is a professional who specializes in using mathematical and statistical models to analyze financial data. They use their knowledge to make investment decisions and to assess the risk and return of investments. This course will provide you with a strong foundation in the fundamentals of statistics, probability theory, and financial analysis, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Quantitative Analyst

Epidemiologist

Epidemiologists study the distribution and determinants of health-related states and events in populations. They work with data to identify risk factors for disease, and to develop and implement prevention and control programs. This course will provide you with a solid foundation in the fundamentals of statistics, data analysis, and epidemiology, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Epidemiologist

Market Researcher

Market researchers conduct studies to collect and analyze data about consumers, markets, and products. They use their findings to help businesses make informed decisions about product development, marketing, and pricing. This course will provide you with a solid foundation in the fundamentals of statistics, data analysis, and consumer behavior, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Market Researcher

Risk Manager

Risk managers identify, assess, and mitigate risks within an organization. They work with data to develop and implement risk management plans, and to ensure that the organization is prepared for potential threats. This course will provide you with a solid foundation in the fundamentals of statistics, data analysis, and risk management, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Risk Manager

Economist

Economists study the production, distribution, and consumption of goods and services within an economy. They work with data to analyze economic trends and to develop policies that promote economic growth and stability. This course will provide you with a solid foundation in the fundamentals of statistics, data analysis, and economic principles, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Economist

Business Analyst

Business analysts work with data to identify opportunities and solve problems within an organization. They use their findings to make recommendations for improvement, and to develop and implement new strategies. This course will provide you with a solid foundation in the fundamentals of statistics, data analysis, and business analysis, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Business Analyst

Data Warehouse Architect

Data warehouse architects design and build data warehouses, which are used to store and manage large volumes of data. They work with data to ensure that it is properly organized and accessible for analysis and reporting. This course will provide you with a solid foundation in the fundamentals of data warehousing, data architecture, and data management, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Data Warehouse Architect

Product Manager

Product managers are responsible for the development and management of products. They work with data to identify customer needs and to develop products that meet those needs. This course will provide you with a solid foundation in the fundamentals of product management, data analysis, and customer experience, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Product Manager

Software Engineer

Software engineers design, build, and maintain computer programs. They work with data to develop solutions to problems and to create new products and services. This course will provide you with a solid foundation in the fundamentals of programming, data analysis, and machine learning, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Software Engineer

Research Scientist

Research scientists conduct research to develop new knowledge and technologies. They work with data to analyze problems and to develop solutions. This course will provide you with a solid foundation in the fundamentals of research methods, data analysis, and scientific principles, making it an ideal choice for learners who want to enter this field. It covers a range of topics, including data normalization, outlier detection and removal, and data visualization.

See salaries and explore the career path for Research Scientist