We may earn an affiliate commission when you visit our partners.
Course image
Course image
edX logo

Understanding the World Through Data

Aleksander Madry, Ana Bell, and ​Silvina Hanono Wachman

Speech recognition, drones, and self-driving cars – things that once seemed like pure science fiction – are now widely available technologies, and just a few examples of how humans have taught machines to analyze data and make decisions. In this hands-on, introductory course, you will examine all the forms in which data exists, learn tools that uncover relationships between data, and leverage basic algorithms to understand the world from a new perspective.

Read more

Speech recognition, drones, and self-driving cars – things that once seemed like pure science fiction – are now widely available technologies, and just a few examples of how humans have taught machines to analyze data and make decisions. In this hands-on, introductory course, you will examine all the forms in which data exists, learn tools that uncover relationships between data, and leverage basic algorithms to understand the world from a new perspective.

Whether you're a high school student or someone switching careers, all you need to get started in this course is a curiosity about the topic of machine learning and a willingness to tinker around with your computer.

The course is taught by modules. Within each module, you'll have access to videos, short exercises, and a final capstone project. In Module 1, you'll begin by looking at different kinds of data. To help you explore the data, you'll dive right into some programming with the Python programming language. You don't need to have any programming background, we will guide you on how to leverage Python to explore and visualize any data.

One kind of data you'll work with is data that relates one variable to another. Coming up with a relationship between two variables—one depending on the other—is at the center of Module 2. In that module, you'll build up some core concepts before seeing your first machine learning algorithm. The goal is to use programming to create models that describe mathematical relationships between data. You'll be able to see how good the model is and use it to make predictions about new data.

In Module 3, you'll see a discussion about where imperfections in collected data might come from. You rarely have perfectly “clean” data sets, so it's important to understand how imperfections impact the model that an algorithm might come up with. To this end, we will introduce the notion of data distributions and build up to the concepts of biased and unbiased noise.

Another kind of data you'll work with is data that belongs in different groups (or classes). Creating a model that predicts what group data belongs in is at the center of Module 4. You'll work through different ways of thinking about this problem and see three different ways of approaching making such groupings (classification).

What's inside

Learning objectives

  • Python programming and the colab notebook programming environment
  • Dependent and independent variables
  • Coming up with relationships between data using linear and polynomial regression models
  • Recognizing how data is distributed
  • How to observe noise in distributions and when to ignore it
  • Categorize data into groups with classification models
  • And more!

Syllabus

Module 1: How to represent and manipulate data
Examples of numerical data
The Python programming language and the Colab notebook programming environment
Read more
Loading datafiles in Colab as dataframes and performing simple operations (selecting rows or columns, filtering data by specific conditions, grouping data, applying functions on the resulting groups)
Finding the correlation between columns of the dataframe
Visualizing the data using line plots, scatter plots, histograms, correlation matrix
Module 2: Reverse engineering nature
Dependent and independent variables and how they correspond to real life scenarios
Intuition for what a linear model is
Intuition for what a polynomial model is
Python libraries that can perform the linear regression on data
Compare the quality of different models (mean-squared-error and R^2 values)
Fitting higher order polynomials
Overfitting
Uniform distributions
Gaussian distributions
Distribution mean and standard deviation
Noise in distributions (biased and unbiased noise)
Categorizing data based on particular conditions being met
Using linear regression to classify a new datapoint as above or below the best fit line
Using a support vector classifier to separate two groups of data and classifying a new datapoint into a group
Using logistic regression to classify data into two groups and finding the probabilities of a new datapoint falling into each group
Understanding how to divide data into training and test sets
Module 3: Distributions and Latent Variables
Module 4: How machines think

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Relies on Python, which is standard in industry
Teaches linear regression, which is used in machine learning
Teaches polynomial regression, which is used to model non-linear relationships
Provides hands-on activities that reinforce the concepts
Builds a foundation in data analysis and machine learning
Has modules on data distributions and noise

Save this course

Save Understanding the World Through Data to your list so you can find it easily later:
Save

Activities

Coming soon We're preparing activities for Understanding the World Through Data. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Understanding the World Through Data will develop knowledge and skills that may be useful to these careers:
Machine Learning Engineer
Machine Learning Engineers design, develop, and maintain machine learning models. A course like Understanding the World Through Data can help Machine Learning Engineers build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role. The course also covers topics such as linear and polynomial regression, classification models, and data distributions, which are all essential for building and deploying machine learning models.
Data Analyst
Data Analysts work with data to extract insights and help businesses make better decisions. A course like Understanding the World Through Data can help Data Analysts build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role.
Statistician
Statisticians collect, analyze, and interpret data to help businesses make better decisions. A course like Understanding the World Through Data can help Statisticians build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical models to analyze financial data and make investment decisions. A course like Understanding the World Through Data can help Quantitative Analysts build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role.
Data Scientist
Data Scientists use statistical methods and machine learning algorithms to uncover patterns in data. A course like Understanding the World Through Data can help Data Scientists build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role.
Research Scientist
Research Scientists conduct research to develop new knowledge and solve problems. A course like Understanding the World Through Data can help Research Scientists build a foundation in Python programming, data manipulation, and statistical modeling that's often required for this role.
Consultant
Consultants help businesses solve problems and improve their performance. A course like Understanding the World Through Data may be useful for Consultants who want to build a foundation in data analysis and machine learning.
Data Engineer
Data Engineers build and maintain the infrastructure that stores and processes data. A course like Understanding the World Through Data may be useful for Data Engineers who want to build a foundation in data analysis and machine learning.
Actuary
Actuaries use mathematical and statistical models to assess risk and make financial decisions. A course like Understanding the World Through Data may be useful for Actuaries who want to build a foundation in data analysis and machine learning.
Data Journalist
Data Journalists use data to tell stories and inform the public. A course like Understanding the World Through Data may be useful for Data Journalists who want to build a foundation in data analysis and machine learning.
Business Analyst
Business Analysts help businesses make better decisions by analyzing data and identifying trends. A course like Understanding the World Through Data may be useful for Business Analysts who want to build a foundation in data analysis and machine learning.
Operations Research Analyst
Operations Research Analysts use mathematical and statistical models to improve the efficiency and effectiveness of business operations. A course like Understanding the World Through Data may be useful for Operations Research Analysts who want to build a foundation in data analysis and machine learning.
Software Engineer
Software Engineers design, develop, and maintain software applications. A course like Understanding the World Through Data may be useful for Software Engineers who want to build a foundation in data analysis and machine learning.
Financial Analyst
Financial Analysts analyze financial data to make investment decisions. A course like Understanding the World Through Data may be useful for Financial Analysts who want to build a foundation in data analysis and machine learning.
Product Manager
Product Managers lead the development and launch of new products. A course like Understanding the World Through Data may be useful for Product Managers who want to build a foundation in data analysis and machine learning.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Understanding the World Through Data.
Comprehensive introduction to deep learning, a subfield of machine learning that has revolutionized many fields. It covers the basics of deep learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about deep learning.
Comprehensive introduction to statistical learning. It covers the basics of statistical learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about statistical learning.
Is an introduction to statistical learning with applications in R. It covers the basics of statistical learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about statistical learning with applications in R.
More theoretical introduction to machine learning. It covers the mathematical foundations of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to understand the theoretical underpinnings of machine learning.
Comprehensive introduction to pattern recognition and machine learning. It covers the basics of pattern recognition and machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about pattern recognition and machine learning.
Probabilistic introduction to machine learning. It covers the basics of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about machine learning from a probabilistic perspective.
Classic introduction to machine learning. It covers the basics of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about machine learning.
Is an introduction to machine learning in Python. It covers the basics of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about machine learning in Python.
Practical introduction to data mining and machine learning. It covers the basics of data mining and machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about data mining and machine learning.
Practical introduction to machine learning for hackers. It covers the basics of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about machine learning from a hacking perspective.
Is an introduction to machine learning with Python. It covers the basics of machine learning, as well as the latest advances in the field. This book valuable resource for anyone who wants to learn more about machine learning with Python.
This comprehensive introduction to machine learning with Python. It covers the basics of supervised learning, unsupervised learning, and reinforcement learning. good starting point for those who are new to machine learning.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser