We may earn an affiliate commission when you visit our partners.
Course image
Samuel Hinton, SuperDataScience Team, and Ligency Team

Welcome to Python for Statistical Analysis.

This course is designed to position you for success by diving into the real-world of statistics and data science.

Read more

Welcome to Python for Statistical Analysis.

This course is designed to position you for success by diving into the real-world of statistics and data science.

  1. Learn through real-world examples: Instead of sitting through hours of theoretical content and struggling to connect it to real-world problems, we'll focus entirely upon applied statistics. Taking theory and immediately applying it through Python onto common problems to give you the knowledge and skills you need to excel.

  2. Presentation-focused outcomes: Crunching the numbers is easy, and quickly becoming the domain of computers and not people. The skills people have are interpreting and visualising outcomes and so we focus heavily on this, integrating visual output and graphical exploration in our workflows. Plus, the extra content on great ways to spice up visuals for reports, articles and presentations, so that you can stand out from the crowd.

  3. Modern tools and workflows: This isn't school, where we want to spend hours grinding through problems by hand for reinforcement learning. No, we'll solve our problems using state-of-the-art techniques and code libraries, utilising features from the very latest software releases to make us as productive and efficient as possible. Don't reinvent the wheel when the industry has moved to rockets.

Enroll now

What's inside

Learning objectives

  • Gain deeper insights into data
  • Use python to solve common and complex statistical and machine learning-related projects
  • How to interpret and visualize outcomes, integrating visual output and graphical exploration
  • Learn hypothesis testing and how to efficiently implement tests in python

Syllabus

Introduction

A general course overview - what we'll cover, how we'll cover it, and where you can get help if things go wrong!

To join the Facebook ground, check this link out: https://www.facebook.com/groups/superdatascience/


For the Python 2v3 links, see:

https://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html

https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/

Read more

Let's talk about setting everything up. What python version we'll use and the different ways you can get it.


If you've downloaded anaconda, you should have everything you need to get started available right away, and if not, here is the updated link to the Anaconda tutorial I've hosted online (apologies, the link has changed from the one in the presentation):

https://cosmiccoding.com.au/tutorial/2018/07/30/anaconda.html


If you've picked miniconda, you'll need to use conda to install dependencies. To do that in your base environment, execute

conda install numpy scipy matplotlib pandas jupyter scikit-learn

If you want a new environment for this course (called 'stats'), try this out

conda create -n stats python=3.9 numpy scipy matplotlib pandas jupyter scikit-learn

conda activate stats

Learning Paths

Let's do a live run through installing anaconda - the best way of getting a scientific distribution of python on your machine.


Anaconda download link: https://www.anaconda.com/distribution/

Miniconda download link: https://docs.conda.io/en/latest/miniconda.html

Now that we've got python installed, we need to figure out how we should write our code. There are a lot of options, so lets touch on them quickly so you can find something that works well for you!

Better than just talking about editors, let's run a few so you can see better how they work and how you can use them.

Finally, let's discuss how to keep track of your code. No one wants to lose work by accident, and there are a few ways around this. One way far superior to the others, as you'll see inside the video!

Exploring Data Analysis

We'll be working with a lot of datasets in the coming lectures. So before we jump into that, let's discuss the different ways we can load data into our code. No coding in this one, let's focus on the higher level for just a moment!

Jumping into the code, let's have a look at all the different ways we can load data into our code, using numpy, pandas and pickling!

Loading data into our code is the easy part. The vast majority of our time will be spent sanitising, cleaning and preparing the data. Let's run through some basic tools you can use to do this, and hope that your first project goes as simply as this example!

Sometimes the data we get doesn't just have NaN's in our data, we have outlying points that we want to identify and potentially remove. Let's look at how.

A brief conceptual overview of a bunch of ways we can visualise one dimensional data before jumping into the code!

One dimensional histograms are easy to make, and by far the most common way of visualising a distribution. You'll see why in the video.

For a bit of flair, we can look at bee swarm plots. Great for presentations!

Another useful tool are box and violin plots. Violin plots can be elegant and useful in direct comparisons, and are used a lot in scientific publications.

Empirical CDFs aren't the most useful visualisation tool, but boy will they come in handy later when we apply statistical tests, so let's cover them here. On top of that, let's also quickly look at panda's describe function, which will quickly become a staple of your workflow.

What do we do when we need to go beyond a single dimension? How do we visualise multivariate distributions and data?

The most common, and probably most useful, visualisation for higher dimensional data is a scatter matrix. And lucky for us, pandas has one built in!

If we want something a bit smaller and faster to make than a scatter matrix, we can get basic information out of a correlation plot! We'll cover correlation mathematically a bit later in the course, so don't worry if the underlying math isn't intuitive!

Let's look at 2D data briefly, and work with some examples on how to plot 2D histograms, contour plots and utilise the power of kernel density estimation!

Let's mix some probability into things and see talk about likelihood contours!


If you are having LaTeX errors here, add usetex=False to c.configure. Will add video annotation when my computer stops crashing on video render -_-

Time to put everything back together for a quick summary! Don't forget to download the attached cheat sheet!

Characterising

Let's get motivated.

We almost always need some measure of the central value in our data or a distribution. Unfortunately, there are many ways of doing this, and we need to figure out which methods we should use in which circumstance.

After finding a central value, we normally always need to characterise the width of the distribution. This one has less freedom, which simplifies things!

Finally, sometimes our distributions are asymmetric, and this needs to be quantified if we wish to approximate our data.

What if we don't want a few standardised numbers and are happy to compress our distribution to an arbitrary number of points? Why, then we'd use percentiles!

Let's move onto multivariate distributions again, just like in the EDA section. Let's quantify covariance and correlation.

Time to wrap it all up for this chapter! Don't forget to download the attached cheat sheet!

Probability

Let's refresh some basic probability theory, probabilistic identities and the difference between a probability density function and a probability mass function.

What are common PDF and PMFs? What are their forms, their parametrisation and when should we use them?

Let's take the functions from the previous video and learn how to invoke them in code!

What are cumulative density functions, survival functions, and how can we use probability theory when our distributions have no analytic form?

Empirical probability distributions in code! Let's discuss different interpolation and integration methods that come hand-in-hand with using an arbitrary function as a PDF.

Now that we've got all these probability density functions, how can we sample from them to generate our own random numbers, and what on Earth is the Central Limit Theorem, and why is it so important?

Now that we've covered the concepts in the previous video, let's power through the code!

Extra Writeup: More resources on sampling distributions

If you're still a bit confused over the central limit theorem, not to worry, let's dig a little deeper!

The main takeaways from probability theory.

Hypothesis Testing

An introduction to hypothesis testing. After all, what does the phrase even mean?

A short motivation example about detecting loaded dice!

Let's talk about the simplest forms of tests - one-tailed and two-tailed tests.

Let's answer a function question about the fate of the planet from asteroid impacts using a one-tailed test.

Proportion testing is a special case of one and two tailed testing, so when would we use it and why?

A fun election rigging example of when proportion testing is useful.

Pearson's Chi2 test is a broad and powerful statistical check for discrete outcomes. Let's see how it works and apply it to our loaded dice example.

If we want to compare entire distributions against each other, then we need other tests. Let's look at the original test - the KS test, and its improved version - the AD test.

Extra Writeup: All the ways to do A/B testing!

Putting it all back together. Don't forget to download the attached cheat sheet!

Conclusion

A brief summary of each chapter, highlighting the main points of each.

A case example of exactly what not to do when you're hypothesis testing.

An introduction to gaussian proccesses.

An extra prac looking at relative rates for disparate distributions.

An extra prac looking at low-number statistics.

An extra prac looking at multivariate-gaussian modelling of relative rates.

An example on how to perform numerical uncertainty analysis that can be applied to almost any statistical problem.

Congratulations!! Don't forget your Prize :)
Bonus: How To UNLOCK Top Salaries (Live Training)

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Provides hands-on skills for Python, numpy, pandas, jupyter scikit-learn, and Anaconda
Suitable for learners who want to learn statistical analysis and data science
Led by a team of experienced instructors: SuperDataScience Team, Ligency Team, Samuel Hinton
Provides foundational skills and knowledge for advanced topics in machine learning and data science
Focuses on applied statistics and data visualization, making it relevant for real-world problem-solving
Utilizes industry-standard tools and techniques, ensuring relevance and transferability of skills
May require learners to have some prior programming experience or familiarity with statistical concepts

Save this course

Save Python for Statistical Analysis to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Python for Statistical Analysis with these activities:
Read 'Python for Data Analysis'
This book provides a comprehensive overview of Python for data analysis, covering topics such as data cleaning, manipulation, and visualization.
Show steps
  • Read the book from cover to cover.
Learn Python for Data Science
Following tutorials for Python and Data Science will help prepare you to succeed in this course.
Browse courses on Python
Show steps
  • Find tutorials online or on platforms like YouTube.
  • Follow along with the tutorials and complete any exercises.
Practice Python Coding Exercises
Completing Python coding exercises will help you solidify your understanding of the concepts.
Browse courses on Python
Show steps
  • Find coding exercises online or in books.
  • Solve the exercises and check your solutions.
Two other activities
Expand to see all activities and additional details
Show all five activities
Blog About Python for Data Science
Blogging about Python for data science will help you solidify your understanding of the concepts and share your knowledge with others.
Browse courses on Python
Show steps
  • Choose a topic to write about.
  • Research the topic.
  • Write a blog post.
Build a Data Science Project
Working on a data science project will allow you to apply your skills and gain practical experience.
Browse courses on Data Science Project
Show steps
  • Define the problem you want to solve.
  • Gather and clean the data.
  • Analyze the data and draw conclusions.
  • Present your results.

Career center

Learners who complete Python for Statistical Analysis will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. The Python for Statistical Analysis course can be beneficial for Data Scientists as it provides a strong foundation in Python, data analysis, and statistical methods. These skills are essential for Data Scientists to effectively collect, clean, analyze, and interpret data, and to communicate their findings.
Statistician
Statisticians collect, analyze, interpret, and present data. Those who have taken Python for Statistical Analysis will be familiar with statistical methods, data analysis, and visualization techniques. This course can help Statisticians develop the skills needed to succeed in their role by providing a solid foundation in Python, a widely used programming language in the field of statistics.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical methods to analyze financial data and make investment recommendations. The Python for Statistical Analysis course can be helpful for Quantitative Analysts as it provides a strong foundation in Python, data analysis, and statistical methods. These skills are essential for Quantitative Analysts to effectively collect, clean, analyze, and interpret financial data, and to develop and evaluate investment strategies.
Machine Learning Engineer
Machine Learning Engineers design, develop, and deploy machine learning models to solve real-world problems. The Python for Statistical Analysis course can be useful for Machine Learning Engineers as it provides a foundation in Python, data analysis, and statistical methods. These skills are important for Machine Learning Engineers to understand the data they are working with, to develop and evaluate machine learning models, and to communicate their findings.
Data Analyst
Data Analysts collect, clean, and analyze data to identify trends and patterns. The Python for Statistical Analysis course can be useful for Data Analysts as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Data Analysts to effectively collect, clean, analyze, and interpret data, and to communicate their findings.
Business Analyst
Business Analysts use data and analysis to improve business processes. The Python for Statistical Analysis course can be helpful for Business Analysts as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Business Analysts to effectively collect, clean, analyze, and interpret data, and to develop and evaluate business recommendations.
Operations Research Analyst
Operations Research Analysts use mathematical and analytical methods to solve complex business problems. The Python for Statistical Analysis course can be beneficial for Operations Research Analysts as it provides a strong foundation in Python, data analysis, and statistical methods. These skills are essential for Operations Research Analysts to effectively collect, clean, analyze, and interpret data, and to develop and evaluate solutions to business problems.
Market Researcher
Market Researchers conduct research to understand consumer behavior and trends. The Python for Statistical Analysis course can be полезны for Market Researchers as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Market Researchers to effectively collect, clean, analyze, and interpret data, and to develop and evaluate marketing strategies.
Actuary
Actuaries use mathematical and statistical methods to assess risk and uncertainty. The Python for Statistical Analysis course can be полезны for Actuaries as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Actuaries to effectively collect, clean, analyze, and interpret data, and to develop and evaluate risk management strategies.
Epidemiologist
Epidemiologists investigate the causes and distribution of disease. The Python for Statistical Analysis course can be полезны for Epidemiologists as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Epidemiologists to effectively collect, clean, analyze, and interpret data, and to develop and evaluate public health strategies.
Biostatistician
Biostatisticians apply statistical methods to medical data. The Python for Statistical Analysis course can be helpful for Biostatisticians as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Biostatisticians to effectively collect, clean, analyze, and interpret medical data, and to develop and evaluate clinical trials.
Financial Analyst
Financial Analysts analyze financial data to make investment recommendations. The Python for Statistical Analysis course can be полезны for Financial Analysts as it provides a foundation in Python, data analysis, and statistical methods. These skills are essential for Financial Analysts to effectively collect, clean, analyze, and interpret financial data, and to develop and evaluate investment strategies.
Database Administrator
Database Administrators manage and maintain databases. The Python for Statistical Analysis course can be полезны for Database Administrators as it provides a foundation in Python, data analysis, and statistical methods. These skills can be beneficial for Database Administrators who need to understand the data they are working with, to design and implement database systems, and to evaluate the performance of database systems.
Software Engineer
Software Engineers design, develop, and maintain software systems. The Python for Statistical Analysis course can be полезны for Software Engineers as it provides a foundation in Python, data analysis, and statistical methods. These skills can be beneficial for Software Engineers who are working on data-intensive applications or who need to use statistical methods in their work.
Data Architect
Data Architects design and manage data systems. The Python for Statistical Analysis course can be полезны for Data Architects as it provides a foundation in Python, data analysis, and statistical methods. These skills can be beneficial for Data Architects who need to understand the data they are working with, to design and implement data systems, and to evaluate the performance of data systems.

Reading list

We've selected 14 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Python for Statistical Analysis.
Provides a comprehensive overview of the Python data science ecosystem, covering topics such as data wrangling, exploratory data analysis, machine learning, and deep learning. It valuable resource for learners who want to gain a deeper understanding of the tools and techniques used in data science.
Provides a practical introduction to machine learning using Python. It covers a wide range of topics, including supervised learning, unsupervised learning, and deep learning. It valuable resource for learners who want to gain hands-on experience with machine learning.
Provides a comprehensive overview of the NumPy package for scientific computing. It covers a wide range of topics, including NumPy syntax, array operations, and linear algebra. It valuable resource for learners who want to gain a deeper understanding of scientific computing using NumPy.
Provides a comprehensive overview of R for data science. It covers a wide range of topics, including data wrangling, exploratory data analysis, and statistical modeling. It valuable resource for learners who want to gain a deeper understanding of data science using R.
Provides a comprehensive overview of the R language. It covers a wide range of topics, including R syntax, data structures, and statistical functions. It valuable resource for learners who want to gain a deeper understanding of the R language.
Provides a comprehensive overview of Python programming. It covers a wide range of topics, including Python syntax, data structures, and object-oriented programming. It valuable resource for learners who want to gain a deeper understanding of Python programming.
Provides a comprehensive overview of deep learning. It covers a wide range of topics, including neural networks, convolutional neural networks, and recurrent neural networks. It valuable resource for learners who want to gain a deeper understanding of deep learning.
Provides a comprehensive overview of Bayesian analysis. It covers a wide range of topics, including probability theory, Bayesian inference, and hierarchical models. It valuable resource for learners who want to gain a deeper understanding of Bayesian analysis.
Provides a comprehensive overview of machine learning from a probabilistic perspective. It covers a wide range of topics, including supervised learning, unsupervised learning, and Bayesian methods. It valuable resource for learners who want to gain a deeper understanding of machine learning.
Provides a comprehensive overview of the ggplot2 package for data visualization. It covers a wide range of topics, including data visualization principles, ggplot2 syntax, and advanced ggplot2 techniques. It valuable resource for learners who want to gain a deeper understanding of data visualization using ggplot2.
Provides a comprehensive introduction to Bayesian statistics. It covers a wide range of topics, including probability theory, Bayesian inference, and hierarchical models. It valuable resource for learners who want to gain a deeper understanding of Bayesian statistics.
Provides a practical introduction to Bayesian data analysis. It covers a wide range of topics, including probability theory, Bayesian inference, and hierarchical models. It valuable resource for learners who want to gain hands-on experience with Bayesian data analysis.
Provides a comprehensive overview of statistical learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and ensemble methods. It valuable resource for learners who want to gain a deeper understanding of statistical learning.
Provides a comprehensive overview of statistical learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and ensemble methods. It valuable resource for learners who want to gain a deeper understanding of statistical learning.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Python for Statistical Analysis.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser