We may earn an affiliate commission when you visit our partners.

Bogdan Anastasiei

Read more

As a data analyst, you will spend a vast amount of your time preparing or processing your data. The goal of data preparation is to convert your raw data into a high quality data source, suitable for analysis. More often than not, this process involves a lot of work. The dplyr package contains the tools that can make this work much easier.

dplyr has a few important advantages over other data data manipulation tools or functions:

it’s much faster (25-30 times faster)

its code is easier to write and understand

it can use chaining to build sequences of commands, thus making the code even cleaner and faster to execute

For these reasons, dplyr quickly began the most popular data manipulation tool among R data scientists. When you finish this course, you will be able to

It is a short course, but it is focused on the most essential commands and functions of the dplyr package, those commands that you will likely use most often.

So let’s see what you are going to learn in this course.

The first section covers the five core dplyr commands. These commands are: filter, select, mutate, arrange and summarise. You will need these commands practically every time when you work with dplyr. They are used to subset data frames, compute new variables, sort data frames, compute statistical indicators and so on. Here’s a few real life scenarios of their utilization:

you need to extract from your respondents data set the male subjects with an income greater than $30,000

you need to compute each respondent’s income per family member, knowing the total income and the number of family members

you have a data set with 27 variables, but you only need 6 for your analysis (so you want to remove the extra variables)

you have to sort your employees data set by salary

you need to compute the average satisfaction towards a product, knowing each individual customer satisfaction etc.

The second section approaches other important dplyr commands and functions. In this section you’ll learn:

how to count the observation in a certain group

how to extract a random sample from your data frame

how to extract the top entries from your data frame, based on a given variable

how to visualize the structure of your data set

how to use the set operations in dplyr (if you have used these operations in base R, you’ll see that dplyr takes them to a whole new level).

In the third section you’ll start to take advantage of the true power of dplyr. Here we’ll talk about chaining – creating sequences of dplyr commands that accomplish multiple tasks with one click only.

The fourth section is about joining data frames with dplyr. This is a very important topic, because many times your data will be found in several data frames. So you will need to join these data frames into only one, suitable for your analyses. We are going to look at five join types available in dplyr: inner_join, semi_join, left_join, anti_join and full_join. We are going to examine the output of each join type using a simple example.

In the fifth section we’ll learn how to combine the dplyr and ggplot2 (using chaining) commands to build expressive charts and graphs. For example, if you want to represent the income distribution for the subjects with a higher education only, or the relationship between income and education level for the female subjects only, in this section you will learn exactly how to do it.

Every command is illustrated with video, both the syntax and the output being explained in detail. At the end of the course, a big number of practical exercises are proposed. By doing these exercises you’ll actually apply in practice what you have learned.

Join this course right now and acquire a critical data analysis ability – data manipulation.

Register for this course and see more details by visiting:
**
OpenCourser.com/course/y6a0fa/data
**

We found seven deals and offers that may be relevant to this course.

Save money when you learn.
All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

Ended October 8

24-Hour Flash Sale

Take advantage of big savings on online courses.

Up to

80%

off

Ended September 28

24-Hour Flash Sale! Save up to 85% on Udemy online courses.

For 24 hours, save big on courses from Udemy's extensive catalog.

Up to

85%

off

Ended September 25

Save on courses

Gain the skills you need to reach your next career milestone.

Up to

85%

off

Use code at checkout.
Ended October 12

Explore new possibilities

Start exploring new possibilities for your future with courses on sale.

Up to

85%

off

ST14MT101024

Ended October 1

Personal Plan sale

Gain unlimited access to thousands of courses. For a limited time, save when you start an annual subscription.

From

40%

off

Valid until November 1

New customer offer

New customers, complete your first order and save big.

Up to

80%

off

Valid for a limited time only

Future-proof your career

Access O'Reilly books, live events, courses, and more. Save with an annual subscription.

Take

15%

off

- Filter data frames using various conditions
- Select and remove data frame columns (variables)
- Sort data frames by column values
- Create new variables from the existing ones
- Compute summary statistics for our data frame

- Other useful operations (count data fame rows, select top rows, select rows at random etc.)
- Chaining dplyr commands to write powerful data manipulation code
- Joining data frames (five joining types)
- Combining dplyr with ggplot2 to create meningful charts

- Filter data frames using various conditions
- Select and remove data frame columns (variables)
- Sort data frames by column values
- Create new variables from the existing ones
- Compute summary statistics for our data frame
- Other useful operations (count data fame rows, select top rows, select rows at random etc.)
- Chaining dplyr commands to write powerful data manipulation code
- Joining data frames (five joining types)
- Combining dplyr with ggplot2 to create meningful charts

Let's see what we are going to cover in this course.

First of all, I'm presenting the structure of this course.

This section is about the five essential dplyr functions - the most used in practice.

Read more

Which are the main dplyr commands (or verbs) and what they are used for for.

How to select entries (observations) in your data frame using various filtering conditions.

Select or remove columns (variables) in your data frame.

Add new variables to your data frame - either from scratch or using existing variables.

Sort your data frame by variable values.

Compute various statistical indicators for the numeric variables in your data frame.

You can apply the dplyr commands to your data fame by groups or segments. In this lecture you will learn how.

This section covers a few other helpful dplyr commands and functions.

Let's see how you can count entries in your data frame, by group.

Another useful way to count observations in your data frame.

A quick way to extract the unique values from a variable.

More often than not, you may need to extract a random sample from your data. Let's see how to do that using the sample() command.

How to select the top entries in your data frame, based on variable values.

How to easily merge two data frames that have the same number of rows or columns.

Do you need to take a rapid look at your data frame? This command is exactly what you need.

If you already know the set operations in base R, let me tell you that dplyr contains powerful extensions of these operations, that allow their use for data frames (not only for vectors). In this lecture we learn two set operations, union() and intersect(). We will exemplify them on data frames, of course.

Other two useful set operations for data frames: setdiff(0 and setequal().

A great functionality that allows us to connect dplyr commands and write powerful code.

The concept of chaining (or piping) in a nutshell.

Let's start with a few easy to understand examples of chaining, using the main dplyr verbs, to get a better view of the procedure.

Getting to more challenging examples, using other dplyr commands as well.

An example is worth 1000 words, so let's examine a few more complicated chaining examples.

The main dplyr commands used to join (merge) data sets.

A short presentation of five joining functions available in the dplyr package.

Joining two data frames that present a common variable using the inner_join() command.

Joining two data frames that present a common variable using the semi_join() command.

Joining two data frames that present a common variable using the left_join() command.

Joining two data frames that present a common variable using the anti_join() command.

Joining two data frames that present a common variable using the full_join() command.

Combining these two packages to draw meaningful charts.

A few explanations on how (and why) to bind dplyr commands with ggplot2 commands through chaining.

How to create a column chart on a subset of your data using both dplyr and ggplot2 commands.

How to create a mean plot chart on a subset of your data using both dplyr and ggplot2 commands.

How to create a scatterplot chart on a subset of your data using both dplyr and ggplot2 commands.

How to create a histogram chart on a subset of your data using both dplyr and ggplot2 commands.

How to create a boxplot chart on a subset of your data using both dplyr and ggplot2 commands.

The practical exercises for this course.

Download the attached PDF file to see them.

The download links for the data sets and source code.

See the download links for the data sets and source code.

Good to know

Know what's good ,
what to watch for , and
possible dealbreakers

Explores data manipulation in R, which is standard in the data analysis industry

Teaches the dplyr package, which helps learners manipulate data more efficiently

Develops skills in data filtering, selection, sorting, and transformation, which are core skills for data analysis

Taught by Bogdan Anastasiei, who is recognized for their work in data analysis

Covers chaining, a powerful technique for combining dplyr commands and writing efficient code

Save Data Manipulation With Dplyr in R to your list so you can find it easily later:

Learners say this course titled Data Manipulation With Dplyr in R is geared towards beginners. Sal, the course teacher, is described as an engaging and knowledgeable teacher who has put together a well-structured course that is a great value for the price. Students largely agree that Dplyr in R is easy to learn and the course benefits from easy-to-understand lectures and examples.

Sal is an excellent educator.

"I think she has experience as a teacher, she can teach well."

"Sal's background as a teacher really shows in the way she structures her courses and in the way she produces the content."

"it really feels like she is with you giving you undivided teaching"

Good for getting started with R.

"It was great course for beginners."

"Easy to understand for beginners."

"I am a total beginner and the explanations are great to understand the tarot clearly!"

Be better prepared
before
your course. Deepen your understanding
during
and
after
it. Supplement your coursework and achieve mastery of the topics covered
in Data Manipulation With Dplyr in R with these
activities:

Read R for Data Science

Show steps

Help you understand the fundamental concepts of data manipulation in R, including the use of dplyr.

View
R for Data Science: Import, Tidy, Transform,...
on Amazon

Show steps

- Obtain a copy of the book.
- Read the chapters on dplyr.
- Complete the exercises in the book.

Create a cheat sheet of dplyr commands

Show steps

Allow you to create a resource that will assist you in completing activities that require using dplyr.

Show steps

- Gather information about the dplyr commands you want to include on your cheat sheet.
- Organize the information in a logical way.
- Create a document or file that contains the cheat sheet.

Follow online tutorials on dplyr

Show steps

Provide you with step-by-step instructions on how to use dplyr to perform common data manipulation tasks.

Show steps

- Search for online tutorials on dplyr.
- Choose a tutorial that covers a topic you are interested in learning more about.
- Follow the instructions in the tutorial.

Six other activities

Expand to see all activities and additional details

Show all nine activities

Practice dplyr commands

Show steps

Help you solidify your understanding of the dplyr syntax and the behavior of its commands.

Show steps

- Find a data set to work with.
- Write a series of dplyr commands to manipulate the data set in various ways.
- Execute the commands and examine the output to verify that they are working as expected.

Help other students with dplyr

Show steps

Provide you a chance to practice your dplyr skills, gain a deeper understanding of the concepts, and enhance your teaching abilities.

Show steps

- Join a study group or online forum where students can ask questions about dplyr.
- Answer questions and provide help to other students.

Attend a dplyr workshop

Show steps

Give you the opportunity to interact with experienced dplyr users, ask questions, and learn from their expertise.

Show steps

- Find a dplyr workshop.
- Register for the workshop.
- Attend the workshop.

Build a data analysis project using dplyr

Show steps

Help you apply your dplyr skills to a real-world problem, and also solidify your understanding of the concepts.

Show steps

- Choose a data set to work with.
- Define the problem you want to solve.
- Write a series of dplyr commands to manipulate the data set and solve the problem.
- Evaluate the results of your analysis.
- Write a report or presentation summarizing your findings.

Create a presentation on dplyr

Show steps

Assist you in organizing and presenting information about dplyr in an engaging manner.

Show steps

- Gather information about dplyr.
- Organize the information into a logical flow.
- Create a presentation using slides or other visual aids.
- Practice your presentation.
- Deliver the presentation to an audience.

Contribute to the dplyr package

Show steps

Allow you to make direct contributions to the dplyr package, gaining valuable experience, and expanding your knowledge of dplyr's internal workings.

Show steps

- Find an issue on the dplyr GitHub repository.
- Fork the dplyr repository.
- Clone your forked repository to your local machine.
- Fix the issue.
- Create a pull request.

Learners who complete Data Manipulation With Dplyr in R will develop knowledge and skills
that may be useful to these careers:

Data Analyst

Data Analysts apply domain knowledge and analytical skills to synthesize complex data for various business problems. These problems may include identifying trends, making forecasts, or assessing the effectiveness of marketing campaigns. This course may be useful in building a foundation for success in this role, as it provides hands-on experience with data manipulation, aggregation, and visualization.

Market Research Analyst

Market Research Analysts research, collect, and analyze data about consumers, competitors, and market trends to inform marketing strategies. This course provides a strong foundation for this role by teaching techniques for data manipulation, aggregation, and visualization.

Information Analyst

Information Analysts collect, organize, and analyze data to provide insights that support decision-making within organizations. This course provides a foundation in data manipulation and analysis, which are crucial skills for Information Analysts.

Statistician

Statisticians collect, analyze, interpret, and present data and use their expertise in statistical methods to solve problems in various fields. The course's focus on data manipulation and statistical analysis makes it a valuable resource for aspiring Statisticians.

Data Scientist

Data Scientists leverage their expertise in data science techniques and tools to extract insights from complex data. This course provides a valuable introduction to data manipulation and analysis, which are essential skills for Data Scientists.

Financial Analyst

Financial Analysts use data analysis and modeling to assess the financial performance of companies and make investment recommendations. This course provides foundational knowledge in data manipulation and analysis, which are crucial skills for Financial Analysts.

Operations Research Analyst

Operations Research Analysts use analytical methods to improve efficiency and decision-making in organizations. This course provides a solid foundation in data manipulation and analysis, which are crucial for Operations Research Analysts.

Data Engineer

Data Engineers design, build, and maintain data pipelines and infrastructure that support data-driven applications and analytics. This course provides a foundational understanding of data manipulation and analysis, which are essential skills for Data Engineers.

Data Visualization Specialist

Data Visualization Specialists design and create visual representations of data to communicate insights and trends. This course provides hands-on experience with data manipulation and visualization, which are essential skills for Data Visualization Specialists.

Actuary

Actuaries use mathematical and statistical techniques to assess risk and uncertainty in various fields, such as insurance and finance. This course provides a strong foundation in data manipulation and analysis, which are essential for Actuaries.

Business Intelligence Analyst

Business Intelligence Analysts use data analysis and visualization to uncover insights that help businesses make informed decisions. This course provides a solid foundation in data manipulation and analysis, making it beneficial for those aspiring to enter this field.

Quantitative Researcher

Quantitative Researchers use mathematical and statistical models to analyze financial data and make investment decisions. This course provides a solid foundation in data manipulation and analysis, which are essential skills for Quantitative Researchers.

Product Manager

Product Managers are responsible for the development and success of products. This course may be beneficial for Product Managers who want to gain a better understanding of data analysis and how it can be used to improve product development and decision-making.

Software Engineer

Software Engineers design, develop, and maintain software applications. While not directly related to data analysis, this course may be beneficial for Software Engineers who work with data-intensive applications, as it provides a foundation in data manipulation and analysis.

Marketing Manager

Marketing Managers plan and execute marketing campaigns to promote products or services. This course may be beneficial for Marketing Managers who want to gain a better understanding of data analysis and how it can be used to measure the effectiveness of marketing campaigns.

For more career information including salaries, visit:
**
OpenCourser.com/course/y6a0fa/data
**

We've selected 12 books
that we think will supplement your
learning. Use these to
develop background knowledge, enrich your coursework, and gain a
deeper understanding of the topics covered in
Data Manipulation With Dplyr in R.

For more information about how these books relate to this course, visit:
**
OpenCourser.com/course/y6a0fa/data
**

Here are nine courses similar to
Data Manipulation With Dplyr in R.

Joining Data in R using dplyr

OpenCourser.com/course/kfe8uv/joining

Joining Data in R using dplyr

Most relevant

Data Visualization using dplyr and ggplot2 in R

OpenCourser.com/course/ho4mj4/data

Data Visualization using dplyr and ggplot2 in R

Most relevant

Merging Data Sources with R 3

OpenCourser.com/course/ktxkfi/merging

Merging Data Sources with R 3

Most relevant

Google Trends Analysis using R

OpenCourser.com/course/mzd0fu/google

Google Trends Analysis using R

Most relevant

Data Manipulation with dplyr in R

OpenCourser.com/course/sd7x2z/data

Data Manipulation with dplyr in R

Programming for Everyone : Working with Data

OpenCourser.com/course/p012ln/programming

Programming for Everyone : Working with Data

Manipulating Dataframes in R

OpenCourser.com/course/5jleci/manipulating

Manipulating Dataframes in R

Fundamentals of Data Analytics in the Public Sector with R

OpenCourser.com/course/5ssw9x/fundamentals

Fundamentals of Data Analytics in the Public Sector with R

Tidy Messy Data using tidyr in R

OpenCourser.com/course/0az7w9/tidy

Tidy Messy Data using tidyr in R

Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser