We may earn an affiliate commission when you visit our partners.
Course image
Roger D. Peng, PhD, Jeff Leek, PhD, and Brian Caffo, PhD

Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

Enroll now

What's inside

Syllabus

Week 1
In this first week of the course, we look at finding data and reading different file types.
Week 2
Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.
Read more
Week 3
Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.
Week 4
Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches R, making it a good choice for analysts and data teams who use this tool
Builds a strong foundation for obtaining, cleaning, and sharing data, which can help analysts and data teams work with data more effectively
Introduces popular data storage systems and extraction tools for accessing data from the web and databases
Covers merging and managing data, providing skills for working with data from multiple sources

Save this course

Save Getting and Cleaning Data to your list so you can find it easily later:
Save

Reviews summary

Data cleaning and acquisition

Learners say this course teaches the essentials of data cleaning and acquistion, including getting data from common formats like databases, XML, and JSON, using R to clean and recode data, and creating tidy datasets. The final lecture provides a long list of links to sources of data. Feedback suggests most learners found it positively impactful, largely because it improved their ability to obtain and manipulate data in R. Some aspects of the course, especially lectures, were criticized as being outdated and lacking in depth. Another criticism was that the quizzes and assignments often went beyond the scope of the materials, with learners having to search for additional information to complete them. Other learners reported issues with links and packages not working as expected. Overall, this course is well received by those that complete it for its practical instruction on data cleaning and acquisition. It is important to note that the materials are somewhat outdated, so some issues may be encountered.
Learners say the lectures, readings, exams, quizzes, homework assignments, and instructors were largely positive.
The course materials are somewhat outdated, and links and packages may not work as expected.
"The course material was not of much help."
"A rather poor and confusing course."
"The instructor of this course, unlike the other 1, is quite unclear about what needed to be done."
"Although this course is on a very interesting topic, it is quite outdated."
There is a gap between the material covered in the lectures and the requirements of some quizzes and assignments. This necessitates external research.
"The course project was really vague."
"I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization."
"this gives us good information, but the information sometimes are incomplete or need to be updated"
"it gives us good information, but the information sometimes are incomplete or need to be updated"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Getting and Cleaning Data with these activities:
Read 'R for Data Science'
Expand your knowledge of R, the primary programming language used in this course, by reading this foundational text.
Show steps
  • Review the basics of R syntax and data structures
  • Learn about data manipulation and transformation techniques
  • Explore statistical modeling and visualization in R
Create a comprehensive course resource repository
Enhance your learning and organization by consolidating all course-related materials, including lecture notes, assignments, and supplemental resources, in one central location.
Browse courses on Knowledge Organization
Show steps
  • Gather and organize materials from all course modules
  • Use a digital or physical system to maintain the repository
  • Categorize and tag materials for easy retrieval
Study text file parsing
Refresh your existing knowledge of working with text files as this is a core component of the course.
Show steps
  • Review basic syntax for opening and reading text files
  • Practice reading text files line by line
  • Experiment with different methods of parsing text files
Five other activities
Expand to see all activities and additional details
Show all eight activities
Practice data cleaning exercises
Refine your data cleaning skills through repetitive exercises to solidify your understanding and prepare you for the techniques taught in this course.
Browse courses on Data Cleaning
Show steps
  • Identify and correct common data errors such as missing values and data type inconsistencies
  • Practice transforming data into a tidy format using techniques like reshaping and joining
  • Analyze the impact of data cleaning on downstream analysis, such as visualization and modeling
Explore data visualization techniques
Expand your knowledge of data visualization by exploring different techniques and best practices, which will be essential for effectively presenting and communicating your findings in this course.
Browse courses on Data Visualization
Show steps
  • Review the principles of effective data visualization, including choosing appropriate chart types
  • Experiment with different visualization tools and libraries
  • Practice creating visualizations that are clear, informative, and visually appealing
Collaborate on data analysis projects
Enhance your learning by collaborating with peers on data analysis projects, gaining diverse perspectives and refining your analytical skills.
Show steps
  • Form study groups or online communities with fellow students
  • Select a dataset or problem for analysis
  • Work together to clean, explore, and analyze the data
  • Share insights and present findings to the group
Build a data cleaning pipeline
Solidify your understanding of data cleaning by building a reusable pipeline that automates common data cleaning tasks, providing a valuable tool for future data analysis projects.
Browse courses on Data Transformation
Show steps
  • Design the pipeline architecture, including data sources, cleaning operations, and output formats
  • Implement the pipeline using appropriate programming tools and techniques
  • Test and refine the pipeline to ensure accuracy and efficiency
  • Document the pipeline for future use and collaboration
Develop a data visualization portfolio
Showcase your data visualization skills and enhance your learning by creating a portfolio that demonstrates your ability to communicate insights effectively through compelling visuals.
Browse courses on Data Storytelling
Show steps
  • Select a variety of datasets and analysis scenarios
  • Create visualizations that illustrate key patterns and relationships
  • Write accompanying narratives to explain the data and highlight insights
  • Share your portfolio with potential employers or collaborators

Career center

Learners who complete Getting and Cleaning Data will develop knowledge and skills that may be useful to these careers:
Data Scientist
A Data Scientist analyzes data to extract meaningful insights and help businesses make informed decisions. This course provides a solid foundation in data acquisition, cleaning, and management, which are critical skills for aspiring Data Scientists. By learning how to handle and process data effectively, graduates of this course can enter this field well-prepared.
Data Analyst
A Data Analyst specializes in collecting, analyzing, and interpreting data to identify trends, patterns, and insights. This course covers essential techniques for data acquisition and cleaning, which are fundamental to the success of Data Analysts. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Database Manager
A Database Manager is responsible for designing, implementing, and maintaining databases. This course provides a solid foundation in data acquisition and management, which are critical skills for Database Managers. By learning how to handle and process data effectively, graduates of this course can excel in this role.
Data Engineer
A Data Engineer designs, builds, and maintains data pipelines and infrastructure. This course provides essential knowledge in data acquisition, cleaning, and management, which are crucial for Data Engineers. Graduates of this course will be well-equipped to handle the challenges of data engineering and contribute to the success of data-driven organizations.
Machine Learning Engineer
A Machine Learning Engineer builds and maintains machine learning models. This course provides a foundation in data acquisition and cleaning, which are essential for training and deploying machine learning models. Graduates of this course will gain an advantage in this field by understanding the intricacies of data handling and its impact on model performance.
Software Engineer
A Software Engineer designs, develops, and maintains software applications. This course provides essential knowledge in data acquisition and management, which are often required in software development projects. Graduates of this course can enhance their skillset and become more valuable to potential employers by gaining proficiency in these areas.
Statistician
A Statistician collects, analyzes, and interprets data to draw meaningful conclusions. This course provides a foundation in data acquisition and cleaning, which are essential for statisticians to ensure the accuracy and reliability of their findings. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Business Analyst
A Business Analyst uses data to identify and solve business problems. This course provides essential training in data acquisition and cleaning, which are crucial for Business Analysts to make informed recommendations. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Market Researcher
A Market Researcher collects and analyzes data to understand market trends and consumer behavior. This course provides a foundation in data acquisition and management, which are essential for Market Researchers to gather and interpret data effectively. Graduates of this course can enhance their skills and become more valuable to potential employers by gaining proficiency in these areas.
Financial Analyst
A Financial Analyst uses data to evaluate and make recommendations on investments. This course provides essential knowledge in data acquisition and cleaning, which are often required in financial analysis. Graduates of this course can enhance their skillset and become more valuable to potential employers by gaining proficiency in these areas.
Actuary
An Actuary uses mathematical and statistical techniques to assess risk and uncertainty. This course provides a foundation in data acquisition and management, which are essential for Actuaries to analyze data and make informed decisions. Graduates of this course will gain a competitive edge in this field by mastering the skills needed to handle data effectively.
Data Warehouse Manager
A Data Warehouse Manager designs, implements, and maintains data warehouses. This course provides essential knowledge in data acquisition and management, which are critical for Data Warehouse Managers to ensure the accuracy and reliability of data storage. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Database Administrator
A Database Administrator manages and maintains databases. This course provides essential knowledge in data acquisition and management, which are crucial for Database Administrators to ensure the smooth functioning of databases. Graduates of this course will gain a strong foundation in data handling and be well-positioned to succeed in this role.
Information Architect
An Information Architect designs and organizes information systems. This course provides a foundation in data acquisition and management, which are essential for Information Architects to understand and structure data effectively. Graduates of this course can enhance their skills and become more valuable to potential employers by gaining proficiency in these areas.
Data Visualization Specialist
A Data Visualization Specialist creates visual representations of data to communicate insights and trends. While this course does not directly cover data visualization, it provides essential training in data acquisition and cleaning, which are crucial for Data Visualization Specialists to ensure the accuracy and reliability of their visualizations. Graduates of this course can complement their skills and become well-rounded professionals by gaining proficiency in these areas.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Getting and Cleaning Data.
Must-read for anyone who wants to learn to work with data in R. It provides a clear and concise introduction to the tidyverse, a collection of packages that make data cleaning and analysis easier.
Comprehensive guide to using R for data science, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about using R for data science.
Comprehensive guide to advanced R programming, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about R.
Comprehensive guide to using SQL for data science, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about using SQL for data science.
Comprehensive guide to using NoSQL databases for data science, covering a wide range of topics, including data modeling, querying, and scaling. It valuable resource for anyone who wants to learn more about using NoSQL databases for data science.
Comprehensive guide to using Hadoop and Spark for big data processing, covering a wide range of topics, including data ingestion, processing, and analysis. It valuable resource for anyone who wants to learn more about using Hadoop and Spark for big data processing.
Comprehensive guide to using R for deep learning, covering a wide range of topics, including neural networks, convolutional neural networks, and recurrent neural networks. It valuable resource for anyone who wants to learn more about using R for deep learning.
Comprehensive guide to using R for computer vision, covering a wide range of topics, including image processing, object detection, and facial recognition. It valuable resource for anyone who wants to learn more about using R for computer vision.
Provides a comprehensive overview of data science concepts and techniques, making it a valuable resource for those who are new to the field. It covers a wide range of topics, including data collection, cleaning, analysis, and visualization.
Comprehensive guide to using R for time series analysis, covering a wide range of topics, including data preprocessing, model building, and forecasting. It valuable resource for anyone who wants to learn more about using R for time series analysis.
Comprehensive guide to data manipulation in R, covering a wide range of techniques for working with data. It valuable resource for anyone who works with data in R.
Comprehensive guide to learning R, covering a wide range of topics, including data manipulation, analysis, and visualization. It valuable resource for anyone who wants to learn more about R.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Getting and Cleaning Data.
Introduction to the Tidyverse
Most relevant
Cleaning and Working with Dataframes in Python
Most relevant
Prepare, Clean, Transform, and Load Data using Power BI
Most relevant
Tidy Messy Data using tidyr in R
Data Cleaning in Excel: Techniques to Clean Messy Data
Exploring and Analyzing Fifa's Datasets Using Python
The R Programming Environment
Wrangling Data in the Tidyverse
Cleaning Data: Python Data Playbook
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser