We may earn an affiliate commission when you visit our partners.
Course image
Rafael Irizarry

In this course, part of our Professional Certificate Program in Data Science,we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Read more

In this course, part of our Professional Certificate Program in Data Science,we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.

This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.

What's inside

Learning objectives

  • Importing data into r fromdifferent file formats
  • Web scraping
  • How to tidy data using the tidyverse tobetter facilitateanalysis
  • String processing with regular expressions (regex)
  • Wrangling data using dplyr
  • How to workwith dates and times as file formats
  • Text mining

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Helps learners clean data, which is a critical step for data scientists
Led by Rafael Irizarry, who teaches at Harvard, a top university
Covers common data wrangling tasks, including importing, cleaning, manipulating, and mining
Provides hands-on experience with data through the use of the tidyverse package
Part of a professional certificate program, indicating a more comprehensive and structured learning experience
Requires learners to have some prior knowledge of R

Save this course

Save Data Science: Wrangling to your list so you can find it easily later:
Save

Reviews summary

Good intro to data science

According to students, Data Science: Wrangling is a good introductory course for those new to the subject. They say it's easy to understand, provides clear explanations, and offers practical examples and exercises.
While the course content can be somewhat dry, the instruction helps make the subject more engaging.
"The topic is quite dry."
"this Course does well in teaching concisely the main concepts"
The course does a great job of explaining the main concepts in a clear manner.
"very clear to understand"
This course is a great entry point for those who are new to data science.
"very good *intro* to the subject"
"this is an entry-level to the subject"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Science: Wrangling with these activities:
Form a study group
Enhance your understanding of the material through peer-to-peer learning and collaboration with other students in the course.
Show steps
  • Find other participants to form a study group
  • Establish a schedule for meetings
  • Prepare materials for the study sessions
Review basic R syntax
A quick review of basic R syntax will help you refresh your memory and ensure that you are ready for the course.
Browse courses on R
Show steps
  • Review the R syntax guide
  • Complete a few practice exercises
  • Optional: Install RStudio and practice writing some simple R code
Review basic statistics
Revisit basic statistical concepts and techniques to strengthen your foundation for data analysis.
Browse courses on Statistics
Show steps
  • Go over basic statistical concepts
  • Solve statistical problems
12 other activities
Expand to see all activities and additional details
Show all 15 activities
Go over import statements
Reinforces core knowledge for the course, which will prove vital later when you work with data in Rstudio.
Browse courses on Importing Data
Show steps
  • Review file import methods in R
Form a study group with your classmates
Working with peers can provide support and motivation, and can help improve your understanding of the material.
Show steps
  • Find a group of classmates who are interested in forming a study group
  • Meet regularly to discuss the course material
  • Work together on assignments and projects
Create a compilation of data wrangling resources
Having a collection of useful resources can save time and improve your productivity.
Browse courses on Data Wrangling
Show steps
  • Find resources on data wrangling, such as articles, tutorials, and videos
  • Organize the resources into a central location
  • Share the resources with your classmates
Read 'Data Wrangling with R'
This book serves as an excellent foundation for the wrangling and tidying of data in R, which will be essential for your success in this course.
View R for Data Science on Amazon
Show steps
  • Read the introduction and first chapter
  • Work through the examples in the book
  • Complete the exercises at the end of each chapter
Practice data wrangling with the tidyverse
The tidyverse is a comprehensive collection of R packages that provide a consistent syntax for data wrangling, transformation, and visualization. This exercise will enable you to master these techniques.
Browse courses on Data Wrangling
Show steps
  • Install the tidyverse package
  • Use the dplyr package to filter and transform data
  • Use the tidyr package to reshape and clean data
Learn HTML parsing techniques
You'll become familiar with HTML parsing techniques and how they are used to extract data from web pages.
Browse courses on HTML Parsing
Show steps
  • Read tutorials on HTML parsing
  • Practice HTML parsing techniques
Attend a workshop on data wrangling
Attending a workshop will provide you with an opportunity to learn from experts and connect with others who are interested in the same topic.
Browse courses on Data Wrangling
Show steps
  • Find a workshop that aligns with your learning objectives
  • Register for the workshop
  • Attend the workshop
Practice wrangling data
Increase your efficiency and reinforce learning by practicing data wrangling techniques in the R programming language.
Browse courses on Data Wrangling
Show steps
  • Work through data wrangling exercises
  • Solve data wrangling problems
Create a data visualization project
Data visualization is an effective way to communicate insights from your data. By creating your own visualization project, you will gain valuable experience in this essential skill.
Browse courses on Data Visualization
Show steps
  • Choose a dataset you are interested in
  • Explore the data and identify the key insights
  • Create a data visualization that effectively communicates these insights
  • Write a brief report explaining your visualization and insights
Create a data visualization
Reinforce data wrangling and data analysis skills by creating a data visualization using a tool such as ggplot2 or Tableau.
Browse courses on Data Visualization
Show steps
  • Choose a dataset to visualize
  • Clean and prepare the data
  • Create a data visualization
Contribute to an open-source data wrangling project
Contributing to an open-source project is a great way to gain experience in data wrangling and collaborate with others.
Browse courses on Data Wrangling
Show steps
  • Find an open-source data wrangling project that you are interested in
  • Read the project documentation
  • Start contributing to the project
Start a project that involves data wrangling
Applying your data wrangling skills to a real-world project will help you solidify your understanding and gain valuable experience.
Browse courses on Data Wrangling
Show steps
  • Identify a problem that can be solved using data wrangling
  • Collect data that is relevant to the problem
  • Wrangle the data to prepare it for analysis
  • Analyze the data to find insights
  • Create a report or presentation to communicate your findings

Career center

Learners who complete Data Science: Wrangling will develop knowledge and skills that may be useful to these careers:
Machine Learning Engineer
Machine Learning Engineers design, build, and deploy machine learning models. They use their skills in data wrangling to prepare data for training, evaluate models, and troubleshoot errors. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Machine Learning Engineers who want to succeed in their roles.
Data Engineer
Data Engineers design, build, and maintain the infrastructure that supports data analysis. They use their skills in data wrangling to ensure that data is clean, consistent, and accessible to analysts. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Data Engineers who want to succeed in their roles.
Statistician
Statisticians use data to make inferences about the world around us. They use their skills in data wrangling to clean and analyze data, and to draw conclusions. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Statisticians who want to succeed in their roles.
Data Scientist
Data Scientists use data to solve business problems. They use their skills in data wrangling to prepare data for analysis, build models, and communicate insights. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Data Scientists who want to succeed in their roles.
Data Analyst
Data Analysts play a central role in the collection, cleaning, and interpretation of data. They use their skills in data wrangling to transform raw data into a usable format that allows businesses to make informed decisions. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Data Analysts who want to succeed in their roles.
Data Visualization Engineer
Data Visualization Engineers create visual representations of data. They use their skills in data wrangling to clean and prepare data for visualization, and to create visualizations that are clear and informative. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Data Visualization Engineers who want to succeed in their roles.
Business Analyst
Business Analysts use data to understand business problems and to make recommendations for improvement. They use their skills in data wrangling to clean and analyze data, and to communicate insights. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Business Analysts who want to succeed in their roles.
Software Engineer
Software Engineers design, build, and maintain software applications. They use their skills in data wrangling to manage data within software applications, and to ensure that data is stored efficiently and securely. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Software Engineers who want to succeed in their roles.
Database Administrator
Database Administrators manage and maintain databases. They use their skills in data wrangling to ensure that data is stored efficiently and securely, and that it is accessible to users who need it. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Database Administrators who want to succeed in their roles.
Web Developer
Web Developers design and develop websites. They use their skills in data wrangling to manage data on websites, and to ensure that data is stored efficiently and securely. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Web Developers who want to succeed in their roles.
Marketing Manager
Marketing Managers develop and execute marketing campaigns. They use their skills in data wrangling to understand customer behavior, and to target marketing campaigns effectively. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Marketing Managers who want to succeed in their roles.
Financial Analyst
Financial Analysts use data to make investment decisions. They use their skills in data wrangling to analyze financial data, and to identify investment opportunities. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Financial Analysts who want to succeed in their roles.
Product Manager
Product Managers manage the development and launch of new products. They use their skills in data wrangling to understand customer needs, and to make decisions about product features and functionality. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Product Managers who want to succeed in their roles.
Management Consultant
Management Consultants advise businesses on how to improve their operations. They use their skills in data wrangling to analyze business data, and to make recommendations for improvement. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Management Consultants who want to succeed in their roles.
Operations Research Analyst
Operations Research Analysts use data to solve business problems. They use their skills in data wrangling to clean and analyze data, and to develop solutions to business problems. This course provides a solid foundation in data wrangling techniques, including importing data from different file formats, tidying data, and working with dates and times. These skills are essential for Operations Research Analysts who want to succeed in their roles.

Reading list

We've selected 14 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Science: Wrangling.
Provides a comprehensive introduction to the R programming language, with a focus on data science applications. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to text mining in R, with a focus on the tm package. It covers all the essential topics for text preprocessing, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive reference to regular expressions, with a focus on practical applications. It covers all the essential topics for writing and using regular expressions, and it is written in a clear and concise style.
Provides a comprehensive introduction to data science for business applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.
Provides a comprehensive introduction to data science, with a focus on the R programming language. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to data analysis in R, with a focus on statistical methods. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to the R programming language, with a focus on data science applications. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to data science, with a focus on practical applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.
Provides a comprehensive introduction to big data analytics, with a focus on practical applications. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to machine learning for data science applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.
Provides a comprehensive introduction to reinforcement learning for data science applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.
Provides a comprehensive introduction to unsupervised learning for data science applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.
Provides a comprehensive introduction to data visualization for data science applications. It covers all the essential topics for data wrangling, analysis, and visualization, and it is written in a clear and concise style.
Provides a comprehensive introduction to statistical learning, with a focus on data science applications. It covers all the essential topics for data wrangling, analysis, and modeling, and it is written in a clear and concise style.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser