In this course, I'm going to teach you how to use the ggplot2 package of R to draw amazing charts that are able to communicate what your data has to say in the most polished, professional way.
My name is Clara and I am a Complex Systems researcher and Data Visualization professor at the University.
In this course, I'm going to teach you how to use the ggplot2 package of R to draw amazing charts that are able to communicate what your data has to say in the most polished, professional way.
My name is Clara and I am a Complex Systems researcher and Data Visualization professor at the University.
Currently, ggplot2 is the best, most powerful tool for building professional graphics. First, because this is a package available in R, which is one of the most used programming languages for Data Science and related fields. So, it is very convenient to be able to produce graphics in the same environment where you already do all of your calculations. Second, because it is the most flexible tool to build graphs. So, even if you don't use R for your analyses, it is worth it to use ggplot2 to draw your plots, because there's no other tool that will give you the results that you can achieve using ggplot2.
Right now, the top agencies for Data Visualization are using ggplot2 to present their data. But still, ggplot2 is a tool that not everybody knows how to use, because of its long learning curve. Because of this, some people turn to other tools, like for example, Microsoft Excel, a tool that is not genuinely made for Data Visualization purposes. So, why don't you take the amazing opportunity of learning ggplot2 now and standing out from the crowd?
When I started learning ggplot2 some years ago, I was a bit overwhelmed by the amount of functions and parameters, and also, I had never used R at the time. But I knew there wasn't any other comparable tool out there. So, I decided to learn it from the grounds, which took me a while. Of course you can watch some tutorials, and learn how to do certain kinds of plots, but I soon realized this wasn't enough to use ggplot in an independent, confindent manner. This course that I'm offering to you today is the course that I wished had existed when I was learning ggplot2. It would have saved me hours and hours of reading books, manuals, the documentation, and endless trial and error.
In this course I've followed a methodology that has proven to work over the years with my students at the University, which is: to truly master ggplot2 you need to learn its core: the grammar of graphics. But, learning this alone might be a little tough, so I've created a series of lessons that first cover a certain part of the grammar, and then we move on to learning how to draw particular plots. Both of these types of lessons are fully hands-on, so you won't be bored one second. Using this method, I've had great success with students that had never used R before, R beginners, and advanced R users.
I really encourage you to make the decision of starting to learn ggplot2, it'll be a skill that you'll use for years to come and which will make a great difference in your career. I promise you won't regret it. Actually, this is what my students say: "This is the best course on R that I have done (M.G)", "This is a perfect course, might be the best I've seen in a long time (A.M)", "[I thought] that it was impossible for me to learn ggplot2 but this course showed me otherwise (M.N)". Take their advice and join us.
*This course covers the ggplot2 3.3 version (the latest release).
This course follows a particular methodology: learn why this is the best way to achieve your ggplot2 goals.
In this lecture we review what we'll cover in this chapter.
Before we start, I want to tell you what is R, in case you've never heard from it before! In this lesson we will briefly introduce what is the R programming language, where does it come from and why is it so used nowadays.
Okay, so now that we know that we're using R for this course, how do we get started? In this lesson I will guide you on how to install the R programming environment and RStudio.
Just a quick update: RStudio is now called Posit. In this video I will guide you through the new website.
Okay, so you have R and RStudio up and ready, what next? In this lesson we will follow the necessary steps to get our R environment working and we will install some packages that we'll need later in the course.
Just a short interruption. It will be quick, I promise.
Okay, so let's start with the basics of R: the data types. In this lesson we will talk about the atomic types in R and we will learn how to use vectors (which you're going to use all the time!)
If you liked vectors, you'll love lists! In this lesson we will review how to create, subset and use lists in R.
In this lesson we will learn a 2-dimensional structure in R: Matrices.
In this lesson we will discuss the use of arrays in R. They are a multidimensional structure that you must know.
Okay, so this is the most important data structure of R. The data.frame! It's so flexible, so powerful, so convenient! Pay close attention to this lesson because you'll need to use dataframes all the time with ggplot!
Let's do a little wrap up on the data structures we've seen so far. In this lesson, we will compare each of the data structures regarding what data they are able to store.
Ah, factors! They are essential in R and particularly when working with ggplot2. Factors are a special type of data structure that only allows certain values (think of it as an enumeration, in any other programming language). In this lesson we'll introduce factors and you'll learn what is their structure and how to convert something to a factor.
As said, factors are important in R, but specially in ggplot. Why? Because most of the times the order in which the data is displayed in a plot follows the order of a factor. So in this lesson you'll learn the native way to arrange a data.frame according to the factor levels that you specify.
It'll come a time when you'll want to plot your own data. Chances are, it will be in text format, but ggplot2 needs a data.frame to plot. So, in this lesson we will learn how to read data from a file and transform it into a data.frame that ggplot understands.
Remember to download the file "readSample.csv" because we'll use it in this lesson!
Sometimes we'll need to preprocess our data before plotting so that we can choose what we want to plot. In this lesson we will learn how to filter, select, arrange and mutate data using the package dplyr. Remember to download the file transformSample.csv because you'll need it for this lesson!
Oh no! My data has some strange <NA> values that are messing my plot! What are those? Don't worry, these are missing values and in this lesson we will learn how to handle them.
This lesson is a collection of small functions that most people overlook. However, they come very handy when we want to generate sequences of elements. And when do we want to do that? Well, in a plot, when you specify the axis labels, for instance. So stay tuned.
Ah, dates. Many times, in plotting, we'll need to deal with dates. Dates are not difficult, but they require special handling. In this lesson we'll learn how to convert something to Date format, how to operate with them and how to create sequences out of them.
Okay, imagine this situation: you need to plot something (for example, a map) that is on data.frame A. Then you want to color certain regions according to data that is in data.frame B. How can you input that to ggplot? Most of the times, the best way is to merge the A and B dataframes into a single one. In this lesson we'll learn how to do that.
Ok, so now that we all know the basics of R we can start learning ggplot2. In this video I'll introduce the contents of this chapter.
Yes, indeed, why ggplot2? Why can't we use Microsoft Excel, or Tableau, or Flourish, or any other tool? What makes ggplot2 so special? In this lesson we'll find out.
In this lesson we follow the basic steps that you (inconsciously) follow every time that you draw a plot (using ANY tool, even if you need to draw it by hand). Later we will see that these basic steps are the core of ggplot2.
Ok, so this is where the beef is. In this lesson we will learn the components of the Grammar of Graphics, which define the structure of a plot in ggplot2. In the following lessons we will learn how to use all of these components one by one.
Ok, so let me explain how the chapters of this course work. In each chapter, we will see some parts of the grammar, and then we'll learn how to apply that knowledge into building certain kinds of plots. In this chapter, we'll review Geometries and Line Plots.
Okay, so where do we start? The first and elemental step is learning how to use geometries, because, without geometries, there's nothing to see when you want to print your plot! Geometries are the way to tell ggplot which kind of plot you want to draw: a line plot, a scatter plot, a density plot... Let's do it!
Before we get into more "grammar" territory, let me explain to you how can we save our plots so that no work is lost!
Okay, the first type of plots we'll learn how to draw is one of the basics: the line plot and its variants. In this lesson we'll start with a single line plot and we'll change the appearance options of the plot.
Okay, yes, drawing ONE line plot is fine, but what if I need to draw MULTIPLE line plots? Well, in that case, we need to take something else into consideration. Let's find out how to do that in this lesson.
Okay, so now it's your turn! Will you be able to reproduce a line plot all by yourself? If you want to have the model you want to recreate side by side, please download the *.pdf available here!
How did it go? Were you able to reproduce the previous plot? In this video I'm giving you the solution.
Okay, so let's dig a little bit deeper into ggplot. In this chapter we'll learn about something essential: the dataset that we want to plot (and where to include it) and the mappings from data to aesthetics. To practice, we'll learn how to draw scatter plots.
Okay, I'm confused, in the plot diagram there are two places where I can specify a dataset and its mappings to aesthetics. Where should I include my data? In the general call or in every particular layer? In this lesson we'll discuss what's the best option.
The aesthetic mappings are not written in stone. We can override them in each different layer to achieve what we want. In this lesson we'll learn how to do that.
Oh no! I've mapped the color aesthetic to "blue" and my scatter plot is pink! How did that happen? - Well, the most likely problem is that you have set your aesthetic instead of mapping it. Do you know the difference between setting and mapping? This is essential! Let's learn this in this lesson.
Do you know the difference between having your data.frame in "long" format versus in "wide" format? Well, it's important, because ggplot likes to receive "long" datasets. In this lesson we'll discuss these two formats and we will learn a tool, the function melt, that allows us to transform from one format to the other.
Okay, time to move on to learning how to draw plots! In this lesson we'll learn another staple: the scatter plot. And we will practice it by drawing a scatter plot on the starwars dataset!
In this lesson we'll learn how to make our previous scatter plot more fancy. Even though we'll see color in more depth in a future lesson (in scales), I want you to be able to handle color "beginner level" now. So, we'll add some colors and annotations, and also we'll learn how to transform our simple scatter plot into a bubble plot. Stay tuned!
Okay, so it's your turn. In this lesson I give you a scatter plot model that you'll need to recreate by applying all of the tools we've learnt so far. Remember to download the *.pdf to be able to have the model by your side. Good luck!
How did you do? In this lesson I'm giving you the solution of the scatter plot exercise.
In this lesson we'll talk about a serious part of the grammar: statistical transformations. They are important because, as you'll see, some geometries have implicit a statistical transformation. So in this lesson you'll learn the different transformations and also how to change the stats of the geometries. Closely related to this, we'll talk about displaying distributions.
Some simple plots like line plots or scatter plots, by default, don't transform your data before plotting. Some other geoms, like geom_bar(), do. It's important that you know how statistical transformations work and how to use them to your advantage. In this lesson we'll learn all of this.
In this lesson we'll discuss some very interesting statistical transformations (that probably are not the most well-known) but that will help you a lot achieving your goals when plotting.
Have you heard about computed aesthetics? Have you ever seen a ggplot source code where a variable is surrounded by dots (like ..count..) and have you ever wondered what does it mean? In this lesson we will talk about computed aesthetics and how can you use them in your plots.
Okay, so it's time to move on to a more complicated type of plots. In the next lessons we'll learn specific plots, but all of them display distributions. Why are distributions important? What are they exactly? Let's find out now.
Let's start by the most famous plot for displaying distributions: the histogram. In this lesson we will learn how to draw histograms and how can we change its parameters. We'll also learn about a less well-known sibling of the histogram: the frequency polygon.
Once we know how to plot histograms, let's move to its "continuous" sibling: the density plot. In this lesson we will learn how to draw density plots, how to tune them to make all distributions visible, and I will show you my favorite way to display density plots without overlap: the ridgeline plot.
Okay, up to now we've learnt a lot: statistical transformations, computed aesthetics, histograms, density plots... Let's take a bit of time to digest all of this by doing a little exercise.
Remember to download the *.pdf file if you prefer to have the model side-by-side with your code.
How did you do? Did you find the way to get the information to put in the labels of the previous histogram? Yes? No? In this lesson you'll see the solution.
Okay, so after pausing a bit, let's move on to a famous type of plot that is often a bit complicated to draw: the boxplot. Don't worry if you've never heard about it, we'll talk about its anatomy and what are boxplots good for. The problem is that more often than not, boxplots are ugly. So in this lesson we'll also learn how to make boxplots look good.
Now that you're familiar with boxplots and density plots, we can move on to violin plots, which are a sort of "hybrid" between these two. In this lesson we'll learn how to draw violin plots and how to tune them to make them look amazing.
Okay, time to practice boxplots and violin plots. In this lesson I give you two very short exercises for you to practice. Good luck!
How did you do in the previous exercises? In this lesson you'll find out the solution!
In this chapter we're going to review several important aspects. First, we'll talk about position adjustments, which is important if you want to be able to place your data when you want it. Then, we'll move onto scales. Scales are responsible for how data elements are transformed into graphical elements, and therefore they are essential. Every time you need to change the limits of the axis, the color palette of your graph, etc, you need to change the appropriate scale. We'll learn that. In relation to position adjustment and scales, we'll learn how to draw Bar plots! Let's start!
Let's start with position adjustment, and as usual, we'll have to mention the "identity" position adjustment, which is, most of the times, the default one. Then, we'll talk about the position adjusment jitter (which we have used previously, but here we will be able to modify its behavior) and nudge.
Specially related to bar plots (which we will see in a moment) are the position adjustments stack, fill and dodge. Have you ever wondered how to draw a stacked bar plot in ggplot? Or a grouped bar plot? In this lesson we will learn how to change the position adjustment to achieve the kind of plot we want.
Okay, fasten your seat belts, because we are going to start with scales! Scales are an essential part of the grammar, without knowing how to handle scales, you'll always have to look out for specific solutions to your problems. As you'll see, when you understand how scales work, everything gets easier.
There are so many scales in ggplot! Do I have to learn how to use every single one of them?? Luckily, no. I took the time to classify them and to find the pattern behind them, so that we can have a notion on how each one of them is used without having to learn all of them by heart. Interesting, huh? Let's do it.
Now, let's get more concrete. Let's start with scales that refer to aesthetics of "position". In this lesson we'll learn how to change the parameters of continuous, discrete and binned scales that refer to the x and y axis.
Well, technically, "date scales" are also position scales, because they also refer to the x and y axis. However, because dealing with objects of the type Date is a bit different, let's review them separately. In this lesson we will learn some functions that will make our life much easier when we are dealing with axes that represent dates. I promise you it is totally worth it.
Ah, colors! We all want to include colors in our plots, right? And, up to now, we already know how to create certain gradients. In this lesson, however, we will review how all color scales work and we will learn how to use the viridis and brewer built-in palettes.
Sometimes we'll need to use scales in manual mode. So let's learn how to do that. We'll also learn how to use identity scales and what are they good for.
By now you already know how to change the labels and limits of a plot, using scales. However, this action is something we do so often, that having shortcut functions comes very handy. So in this lesson we'll learn how to change the limits and labels of a plot without having to call the scale function explicitly.
Now that you master position adjustment and scales it's time to talk about Bar plots and to put everything we've learnt into practice. In this lesson you'll learn the difference between a histogram and a bar plot, and also the difference between geom_col and geom_bar. We'll also discuss what's the best way to display a bar chart of two categorical variables.
Bar plots are very useful, but it is true that sometimes people find them boring. That's why some people try to solve that by adding fancy decorations to bar plots, or they use weird shape instead of bars. From a theoretical point of view, that shouldn't be done: it only adds clutter and distortion. In this lesson I will show you four steps to make your bar plots look modern.
Time to practice! Given that you've already practiced a lot to do simple bar plots, in this lesson I propose something different. Let's try to recreate a plot that was published in The Economist. Do you think we'll be able to do that using ggplot?
Remember to download the corresponding *.pdf file if you want to have the model side-by-side.
Did you make it? In this lesson I'll show you the solution!
In this chapter we'll discuss how to use the coordinate system, and we will learn [what for me is] the most fun part of the grammar, which actually, is not even a part of the grammar: the theme system! We'll learn how to draw maps as well!
To start with coordinate systems, let me tell you there's basically two types of systems: the cartesian one and the polar one. In this lesson we'll review the cartesian coordinate system and also one of its auxiliary functions: the coord_flip function. Remember that to flip the x and y axes we used to change the mappings? Now, finally, we'll be able to do it properly.
Regarding the cartesian coordinate system, there are two functions that come very handy: coord_fixed, which will allow us to define a fixed aspect ratio on our plot (so that our plot will be kept in that proportion no matter the exporting size), and coord_quickmap, a fast way to give our maps the correct shape.
The polar coordinate system is something that you won't use that often. However, it is interesting because it will allow us to draw something that we cannot draw explicitly in ggplot: a pie chart.
I must confess, this is the most fun part of ggplot2 for me. The theme system controls the appearance of every single object of our plot. In this lesson we'll start by learning which are the complete built-in themes in ggplot2 and how can we apply them.
Complete themes are handy and convenient, but most of the times you'll want to change particular things yourself. To do so, you need to learn the names of the elements of the plot and you need to know how to use the corresponding element functions. We'll learn how to do that now.
Bonus time! A convenient ggplot2 Theme System cheatsheet! :)
Once we know how we can modify every single element of the plot, we'll be able to build our own custom plot. This is a lot of fun, and also it is worth it, because after creating your own theme, you'll be able to apply it to every single plot you produce. Cool, huh?
To draw a map, the first step is always to draw a background map. In this lesson we will learn how to draw a map using the map_data function.
Now that we know how to draw a background map, we'll learn how to convert it into a choropleth map by merging some data and coloring different map regions according to values of that variable.
Once we have a background or choropleth map, we can easily convert it into a bubble map by adding a geom_point on top, or to annotate it using geom_text, geom_label, or any other information on top. Let's do this!
More often than not, the map you'll want to plot is not one of the previously available maps on map_data. So you'll need to find your own data, and it is likely that it will be in some strange format. In this lesson we'll learn how to deal with GIS data and we'll learn a very interesting new geometry that makes our life very easy in plotting these kinds of maps.
We are reaching the end! By now, you already know a lot about how ggplot2 works. In this chapter, we'll talk about the last part of the grammar, which is the faceting system. And, once we are done with that, we'll be ready to start designing and drawing our own custom plots!
The faceting system is a part of the grammar that allows us to draw something called small multiples. In this lesson we will discuss what are exactly small multiples and what are they good for.
There are two faceting functions that are very similar: facet_grid and facet_wrap. What is exactly the difference between them? When to use one or the other? Let's see it in this lesson.
In this lesson I give you an example (or a follow-along exercise) on how to draw a beautiful plot that contains facets. For this example we use the Iris dataset and we will learn how to highlight different samples of data in the different facets. Unmissable.
In this exercise we will learn how to draw a series of maps using the faceting system. It's a pretty fancy plot!
How did you do in the previous exercise? In this exercise I'm showing you the step-by-step solution to drawing small multiples and maps!
The final moment is here! Not only now you're able to draw the traditional plots (line, scatter, bar, density, maps...), but also you'll be able to create your own customized plots. Have you ever heard of something called the Lollipop plot? In this lesson we'll learn how to draw one using ggplot2.
An interesting variation of the Lollipop plot, more suited to depict ranges of data, is the Dumbbell plot. It is called like this because its shape resembles that of a dumbbell. In this lesson we'll learn how to draw a dumbbell plot using the "Presidential" dataset.
In this lesson I want to show you how powerful ggplot2 is. We'll try to recreate an amazing plot (that actually got a prize on a Data Visualization contest!) using ggplot2 only.
Congratulations! I'm so proud of you! This course has been a lot of hard work but it has finally paid off! How do you feel? Let's talk about what happens now.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.