We may earn an affiliate commission when you visit our partners.
Course image
Carrie Wright, PhD, Shannon Ellis, PhD, Stephanie Hicks, PhD, and Roger D. Peng, PhD

Getting data into your statistical analysis system can be one of the most challenging parts of any data science project. Data must be imported and harmonized into a coherent format before any insights can be obtained. You will learn how to get data into R from commonly used formats and harmonizing different kinds of datasets from different sources. If you work in an organization where different departments collect data using different systems and different storage formats, then this course will provide essential tools for bringing those datasets together and making sense of the wealth of information in your organization.

Read more

Getting data into your statistical analysis system can be one of the most challenging parts of any data science project. Data must be imported and harmonized into a coherent format before any insights can be obtained. You will learn how to get data into R from commonly used formats and harmonizing different kinds of datasets from different sources. If you work in an organization where different departments collect data using different systems and different storage formats, then this course will provide essential tools for bringing those datasets together and making sense of the wealth of information in your organization.

This course introduces the Tidyverse tools for importing data into R so that it can be prepared for analysis, visualization, and modeling. Common data formats are introduced, including delimited files, spreadsheets and relational databases, and techniques for obtaining data from the web are demonstrated, such as web scraping and web APIs.

In this specialization we assume familiarity with the R programming language. If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course.

Enroll now

What's inside

Syllabus

Importing (and Exporting) Data in R
A basic data type in the tidyverse is the tibble. Tibbles store tabular data and are a modern take on the standard R data frame. They have many user-friendly features that are an improvement over standard data frames when doing interactive data analysis. The remainder of this module covers tabular data in spreadsheet formats like Excel, CSV, TSV, and other delimited files.
Read more
JSON, XML, and Databases
Data can come in non-tabular formats, especially unstructured data or data that otherwise would not fit into a table. JSON and XML are common formats for storing arbitrarily structured data and this module covers the packages used to read in those data formats. In addition, relational databases are common for storing very large collections of tables where you do not need to read in the entire dataset at once. There are many relational database formats and we will cover the SQLite format, which is a compact and simple to use format.
Web Scraping and APIs
Reading in data from various Internet sources can be a useful way to build analyses that need to be regularly updated. The rvest and httr packages are useful for connecting to web sites, web APIs and other online sources of data.
Foreign Formats, Images, and googledrive
Working with others in a data science project often involves reading output or data produced using other statistical analysis packages or other software. This module covers packages for reading in these foreign formats, as well as images and data from Google Drive.
Case Studies
Now we will demonstrate how to import data using our case study examples. When working through the steps of the case studies, you can use either RStudio on your own computer or Coursera lab spaces provided for each case study.
Project: Importing Data into R
This project will give you the opportunity to read in data from multiple sources and conduct some simple operations on those data.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches students how to import data into R, a fundamental skill for data analysis
Develops proficiency in using common data formats and sources, such as spreadsheets, databases, and web scraping
Assumes familiarity with R programming, making it suitable for intermediate learners who wish to enhance their data handling skills

Save this course

Save Importing Data in the Tidyverse to your list so you can find it easily later:
Save

Reviews summary

Highly rated importing course

Learners say that this is a well reviewed course that is great for beginners, but also provides value for expert learners. The clear explanations and engaging assignments help develop a data analysis logic.
Learners find this course easy to follow
"Clearly explained, and easy to follow"
Many learners find this course to be excellent
"Excellent tutorial for importing data into the tidyverse environment"
"Very useful and informative, especially for web related data"
The quizzes and assignments are helpful for learning
"having the quizzes makes the difference between just reading and actually learning"
Some learners would like more practice exercises
"A little bit more practice exercises would made this course perfect"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Importing Data in the Tidyverse with these activities:
Review data science concepts
Reviewing data science concepts will help you with the course material.
Browse courses on Data Science
Show steps
  • Go over the data science concepts covered so far.
  • Complete a few practice exercises.
Review R programming basics
Reviewing R programming will help you prepare for the course.
Browse courses on R Programming
Show steps
  • Go over the R programming basics.
  • Complete a few practice exercises.
Join a study group or discuss with other students
Talking to other students will help reinforce the concepts you've learned.
Show steps
  • Join a study group for this course.
  • Discuss the course material with other students.
Three other activities
Expand to see all activities and additional details
Show all six activities
Read Data Science for Business
This book provides a comprehensive overview of the data science process, including data import techniques.
Show steps
  • Read the first three chapters of the book.
  • Complete the exercises in the first three chapters.
  • Summarize the key concepts from the first three chapters.
Complete the RStudio Data Import Tutorial
This tutorial will provide hands-on experience with importing data into R using RStudio.
Browse courses on RStudio
Show steps
  • Follow the steps in the RStudio Data Import Tutorial.
  • Import a dataset of your own.
Practice importing data into R using different formats
Practice will help you become proficient in importing data into R.
Browse courses on Data Formats
Show steps
  • Import a CSV file into R.
  • Import a JSON file into R.
  • Import an XML file into R.

Career center

Learners who complete Importing Data in the Tidyverse will develop knowledge and skills that may be useful to these careers:
Data Engineer
Data Engineers build and maintain the infrastructure that supports data analysis. They work with data scientists and other stakeholders to design and implement data pipelines, ensuring that data is clean, consistent, and accessible. This course provides a strong foundation in data import and harmonization, essential skills for Data Engineers. By learning how to import data from various sources and formats, you will be better equipped to build and maintain data pipelines that meet the needs of your organization.
Data Analyst
Data Analysts use data to solve business problems. They collect, clean, and analyze data to identify trends and patterns. This course provides a strong foundation in data import and harmonization, essential skills for Data Analysts. By learning how to import data from various sources and formats, you will be better equipped to collect and clean data for analysis.
Data Scientist
Data Scientists use data to build predictive models and make recommendations. They work with data engineers and other stakeholders to design and implement data pipelines, and they develop and deploy models to solve business problems. This course provides a strong foundation in data import and harmonization, essential skills for Data Scientists. By learning how to import data from various sources and formats, you will be better equipped to build and maintain data pipelines and develop models that meet the needs of your organization.
Statistician
Statisticians use data to design and conduct studies, analyze data, and draw conclusions. They work in a variety of settings, including academia, government, and industry. This course provides a strong foundation in data import and harmonization, essential skills for Statisticians. By learning how to import data from various sources and formats, you will be better equipped to collect and clean data for analysis.
Quantitative Researcher
Quantitative Researchers use data to make investment decisions. They work with data scientists and other stakeholders to develop and implement investment models. This course provides a strong foundation in data import and harmonization, essential skills for Quantitative Researchers. By learning how to import data from various sources and formats, you will be better equipped to build and maintain data pipelines and develop models that meet the needs of your organization.
Business Analyst
Business Analysts use data to improve business processes. They work with stakeholders to identify and analyze problems, and they develop and implement solutions. This course provides a strong foundation in data import and harmonization, essential skills for Business Analysts. By learning how to import data from various sources and formats, you will be better equipped to collect and clean data for analysis.
Market Researcher
Market Researchers use data to understand consumer behavior. They work with marketing teams to develop and implement marketing campaigns. This course provides a strong foundation in data import and harmonization, essential skills for Market Researchers. By learning how to import data from various sources and formats, you will be better equipped to collect and clean data for analysis.
Web Developer
Web Developers design and build websites. They work with web designers and other stakeholders to create websites that are user-friendly and meet the needs of the organization. This course may be useful for Web Developers who need to import data from various sources and formats into their websites.
Database Administrator
Database Administrators manage and maintain databases. They work with database designers and other stakeholders to ensure that databases are performant and secure. This course may be useful for Database Administrators who need to import data from various sources and formats into their databases.
Data Journalist
Data Journalists use data to tell stories. They work with journalists and other stakeholders to create data-driven articles and visualizations. This course may be useful for Data Journalists who need to import data from various sources and formats into their articles and visualizations.
Information Security Analyst
Information Security Analysts protect organizations from cyberattacks. They work with security teams and other stakeholders to identify and mitigate security risks. This course may be useful for Information Security Analysts who need to import data from various sources and formats into their security systems.
Product Manager
Product Managers develop and manage products. They work with engineers, designers, and other stakeholders to create products that meet the needs of users. This course may be useful for Product Managers who need to import data from various sources and formats into their product development process.
Project Manager
Project Managers plan and execute projects. They work with project teams and other stakeholders to ensure that projects are completed on time and within budget. This course may be useful for Project Managers who need to import data from various sources and formats into their project management tools.
Marketing Manager
Marketing Managers develop and implement marketing campaigns. They work with marketing teams and other stakeholders to create marketing campaigns that reach the target audience and achieve the desired results. This course may be useful for Marketing Managers who need to import data from various sources and formats into their marketing campaigns.
Sales Manager
Sales Managers lead and manage sales teams. They work with sales teams and other stakeholders to develop and implement sales strategies. This course may be useful for Sales Managers who need to import data from various sources and formats into their sales management tools.

Reading list

We've selected 12 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Importing Data in the Tidyverse.
Reference guide to the ggplot2 package for data visualization in R, and would be a useful reference for students in this course.
Is not a general introduction to the R programming language, but it is one of the leading references for Hadley Wickham's tidyverse tools and as such would be a valuable reference text for students taking this course.
Reference guide to the R Markdown language for creating dynamic reports, and would be a useful reference for students in this course.
Practical guide to data manipulation in R, and would be a useful reference for students in this course.
Collection of recipes for solving common problems in R, and is written to be useful as a reference for R users at all levels.
Provides an introduction to the use of R in machine learning and statistical modeling, which would complement this course nicely.
Is primarily intended as an introduction to the R programming language and thus will serve as a good prerequisite or additional reference for this course.
Is an introduction to the use of R in data science, and is another good reference for the tidyverse tools used in this course.
Reference guide to the RStudio integrated development environment, and would be useful as a reference for students using RStudio for this course.
This textbook good introduction to linear algebra, and would provide a good foundation for understanding some of the concepts used in data science and machine learning.
Although this textbook uses Python rather than R, it provides a nice introduction to exploratory data analysis which would be useful for students in this course.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser