We may earn an affiliate commission when you visit our partners.
Course image
Coursera logo

Getting and Cleaning Data

Roger D. Peng, PhD, Jeff Leek, PhD, and Brian Caffo, PhD

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

Enroll now

What's inside

Syllabus

Week 1
In this first week of the course, we look at finding data and reading different file types.
Week 2
Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
Read more
Week 3
Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.
Week 4
Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches R, making it a good choice for analysts and data teams who use this tool
Builds a strong foundation for obtaining, cleaning, and sharing data, which can help analysts and data teams work with data more effectively
Introduces popular data storage systems and extraction tools for accessing data from the web and databases
Covers merging and managing data, providing skills for working with data from multiple sources

Save this course

Save Getting and Cleaning Data to your list so you can find it easily later:
Save

Reviews summary

Data cleaning and acquisition

Learners say this course teaches the essentials of data cleaning and acquistion, including getting data from common formats like databases, XML, and JSON, using R to clean and recode data, and creating tidy datasets. The final lecture provides a long list of links to sources of data. Feedback suggests most learners found it positively impactful, largely because it improved their ability to obtain and manipulate data in R. Some aspects of the course, especially lectures, were criticized as being outdated and lacking in depth. Another criticism was that the quizzes and assignments often went beyond the scope of the materials, with learners having to search for additional information to complete them. Other learners reported issues with links and packages not working as expected. Overall, this course is well received by those that complete it for its practical instruction on data cleaning and acquisition. It is important to note that the materials are somewhat outdated, so some issues may be encountered.
Learners say the lectures, readings, exams, quizzes, homework assignments, and instructors were largely positive.
The course materials are somewhat outdated, and links and packages may not work as expected.
"The course material was not of much help."
"A rather poor and confusing course."
"The instructor of this course, unlike the other 1, is quite unclear about what needed to be done."
"Although this course is on a very interesting topic, it is quite outdated."
There is a gap between the material covered in the lectures and the requirements of some quizzes and assignments. This necessitates external research.
"The course project was really vague."
"I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization."
"this gives us good information, but the information sometimes are incomplete or need to be updated"
"it gives us good information, but the information sometimes are incomplete or need to be updated"

Career center

Learners who complete Getting and Cleaning Data will develop knowledge and skills that may be useful to these careers:
Data Scientist
A Data Scientist analyzes data to extract meaningful insights and help businesses make informed decisions. This course provides a solid foundation in data acquisition, cleaning, and management, which are critical skills for aspiring Data Scientists. By learning how to handle and process data effectively, graduates of this course can enter this field well-prepared.
Data Analyst
A Data Analyst specializes in collecting, analyzing, and interpreting data to identify trends, patterns, and insights. This course covers essential techniques for data acquisition and cleaning, which are fundamental to the success of Data Analysts. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Database Manager
A Database Manager is responsible for designing, implementing, and maintaining databases. This course provides a solid foundation in data acquisition and management, which are critical skills for Database Managers. By learning how to handle and process data effectively, graduates of this course can excel in this role.
Data Engineer
A Data Engineer designs, builds, and maintains data pipelines and infrastructure. This course provides essential knowledge in data acquisition, cleaning, and management, which are crucial for Data Engineers. Graduates of this course will be well-equipped to handle the challenges of data engineering and contribute to the success of data-driven organizations.
Machine Learning Engineer
A Machine Learning Engineer builds and maintains machine learning models. This course provides a foundation in data acquisition and cleaning, which are essential for training and deploying machine learning models. Graduates of this course will gain an advantage in this field by understanding the intricacies of data handling and its impact on model performance.
Software Engineer
A Software Engineer designs, develops, and maintains software applications. This course provides essential knowledge in data acquisition and management, which are often required in software development projects. Graduates of this course can enhance their skillset and become more valuable to potential employers by gaining proficiency in these areas.
Statistician
A Statistician collects, analyzes, and interprets data to draw meaningful conclusions. This course provides a foundation in data acquisition and cleaning, which are essential for statisticians to ensure the accuracy and reliability of their findings. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Business Analyst
A Business Analyst uses data to identify and solve business problems. This course provides essential training in data acquisition and cleaning, which are crucial for Business Analysts to make informed recommendations. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Market Researcher
A Market Researcher collects and analyzes data to understand market trends and consumer behavior. This course provides a foundation in data acquisition and management, which are essential for Market Researchers to gather and interpret data effectively. Graduates of this course can enhance their skills and become more valuable to potential employers by gaining proficiency in these areas.
Financial Analyst
A Financial Analyst uses data to evaluate and make recommendations on investments. This course provides essential knowledge in data acquisition and cleaning, which are often required in financial analysis. Graduates of this course can enhance their skillset and become more valuable to potential employers by gaining proficiency in these areas.
Actuary
An Actuary uses mathematical and statistical techniques to assess risk and uncertainty. This course provides a foundation in data acquisition and management, which are essential for Actuaries to analyze data and make informed decisions. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Data Warehouse Manager
A Data Warehouse Manager designs, implements, and maintains data warehouses. This course provides essential knowledge in data acquisition and management, which are critical for Data Warehouse Managers to ensure the accuracy and reliability of data storage. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Database Administrator
A Database Administrator manages and maintains databases. This course provides essential knowledge in data acquisition and management, which are crucial for Database Administrators to ensure the smooth functioning of databases. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Information Architect
An Information Architect designs and organizes information systems. This course provides a foundation in data acquisition and management, which are essential for Information Architects to understand and structure data effectively. Graduates of this course can enhance their skills and become more valuable to potential employers by gaining proficiency in these areas.
Data Visualization Specialist
A Data Visualization Specialist creates visual representations of data to communicate insights and trends. While this course does not directly cover data visualization, it provides essential training in data acquisition and cleaning, which are crucial for Data Visualization Specialists to ensure the accuracy and reliability of their visualizations. Graduates of this course can complement their skills and become well-rounded professionals by gaining proficiency in these areas.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Getting and Cleaning Data.
Must-read for anyone who wants to learn to work with data in R. It provides a clear and concise introduction to the tidyverse, a collection of packages that make data cleaning and analysis easier.
Comprehensive guide to using R for data science, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about using R for data science.
Comprehensive guide to advanced R programming, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about R.
Comprehensive guide to using SQL for data science, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about using SQL for data science.
Comprehensive guide to using NoSQL databases for data science, covering a wide range of topics, including data modeling, querying, and scaling. It valuable resource for anyone who wants to learn more about using NoSQL databases for data science.
Comprehensive guide to using Hadoop and Spark for big data processing, covering a wide range of topics, including data ingestion, processing, and analysis. It valuable resource for anyone who wants to learn more about using Hadoop and Spark for big data processing.
Comprehensive guide to using R for deep learning, covering a wide range of topics, including neural networks, convolutional neural networks, and recurrent neural networks. It valuable resource for anyone who wants to learn more about using R for deep learning.
Comprehensive guide to using R for computer vision, covering a wide range of topics, including image processing, object detection, and facial recognition. It valuable resource for anyone who wants to learn more about using R for computer vision.
Provides a comprehensive overview of data science concepts and techniques, making it a valuable resource for those who are new to the field. It covers a wide range of topics, including data collection, cleaning, analysis, and visualization.
Comprehensive guide to using R for time series analysis, covering a wide range of topics, including data preprocessing, model building, and forecasting. It valuable resource for anyone who wants to learn more about using R for time series analysis.
Comprehensive guide to data manipulation in R, covering a wide range of techniques for working with data. It valuable resource for anyone who works with data in R.
Comprehensive guide to learning R, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about R.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Getting and Cleaning Data.
Introduction to the Tidyverse
Most relevant
Cleaning and Working with Dataframes in Python
Most relevant
Prepare, Clean, Transform, and Load Data using Power BI
Most relevant
Tidy Messy Data using tidyr in R
Data Cleaning in Excel: Techniques to Clean Messy Data
Exploring and Analyzing Fifa's Datasets Using Python
The R Programming Environment
Wrangling Data in the Tidyverse
Cleaning Data: Python Data Playbook
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser