Save for later

Exploratory Data Analysis

Data Science,

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
Get Details and Enroll Now

OpenCourser is an affiliate partner of Coursera and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 4.5 based on 745 ratings
Length 5 weeks
Starts Jun 19 (50 weeks ago)
Cost $49
From Johns Hopkins University via Coursera
Instructors Roger D. Peng, PhD, Jeff Leek, PhD, Brian Caffo, PhD
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science Mathematics
Tags Data Science Data Analysis Probability And Statistics

Get a Reminder

Send to:

Similar Courses

What people are saying

exploratory data analysis

very clear and easy to review and learn ggplot and other useful things This is the worst of the Data Science courses so far (they've all been pretty good up to this point).It's called Exploratory Data Analysis, but is actually all about the graphics systems in R. And it does a botched job on those as well.All quizzes and assignments are about the graphics systems.

Didn't know anything about R Exploratory Data Analysis before but now it seems easy.. the best course ever to understand gg plot It's a very good course.

Thjis one of the best courses gives a great idea about plotting and exploratory data analysis !

!The swirl packages and course projects in "Exploratory Data Analysis" course have really helped me to understand the power of R in performing introductory graphical analyses towards initial inferences.

学到很多实用东西 Great course to learn how to build nice graphics and do exploratory data analysis Very good course!

This was an important class because in future classes in the certification peer reviewed projects require some sort of exploratory data analysis.

Very informative I am using exploratory data analysis almost every time when loading raw data.

The course on Exploratory Data Analysis was highly enjoyable.

excellent great course for fundamental exploratory data analysis, good starting point for using R to do basic analysis This was the course that I've enjoyed it the most.

I would merge this class within R-Programming section and call it Part 2 rather than categorizing into "Exploratory Data Analysis".

Fenomenal An interesting course Fun and good approach on exploratory data analysis Awesome and must take course!!

Exploratory Data Analysis is one of first steps in every project.

Super useful tutorial of the different plotting systems, and basic exploratory data analysis.

Very accessible yet an extensive overview of techniques in exploratory data analysis.

Read more

data science

A painful, dull offline course on plotting & clustering in R slapped online with minimal conversion like the rest of JHU's execrable Data Science specialisation*.

This is the fourth course in the Data Science specialization.

Also great stuff to practice previous training on Data Science.

A very useful course for data science beginners.

Learning how to make plots and play with the data is where data science finally started to get fun!

for Crediting the Course Very good for anyone wanting to get into the field of Data Science using R This course is very important for beginners and it is also simple to understand.

Nice course Great first step towards data science Very good material and structure of course!

good learning experience This course is most recommended for Data science practitioner.

Graphs and plotting is at the heart of data analysis and data science, and without it you would have difficulty conveying ideas, and having graphs to explain numerical/statistical data is always handy.

Material teached in this course is must have for everybody who wants to use R for Data Science.

I am looking for a new Data Scientist career.Please, take a look at my LinkedIn profile: I did this course to get new knowledge about Data Science and better understand the technology and your practical applications.

Its one of the most important steps in learning data science.

Exploratory Data Analysis is the 4th course in John Hopkins’s data science specialization track.

The plotting lectures that make up the bulk of the course are well done and this course provides more instructor face time and live examples in R than any of the 3 courses in the first wave of the data science track.

Read more

base plotting system

I learned to make plots with the base plotting system and with the lattice and ggplot2 packages.

Great course Interesting learning more about ggplot and base plotting system, as well as clustering techniques.

Prof. Peng teaches you not only how to use the r base plotting system but also how to make wonderful graphs using the lattice and ggplot2 packages.

It is better to go through good tutorial over R base plotting system and ggplot2.

The assignments are very easy if you have basic familiarity with R's base plotting system and the "ggplot2" package.

Provides a solid overview of the base plotting system and a discussion (better elsewhere) of others.

Read more

svd and pca

What's worse the SVD and PCA sections require a fairly high level of linear algebra knowledge to understand, which are not prerequisites for this course.

I really enjoyed the sections on SVD and PCA as these really require mathematical maturity.

Help to Better understanding of R-programming graphical Quite repetitious in covering basic graphing, and very shallow in regards of clustering, SVD and PCA.

SVD and PCA part of the course could have been elaborated better, and a pilot project on that would have cleared the basic concept.

great course but wish to have more materials or explanation on svd and PCA part.

It would have been better if they left the SVD and PCA functions as black boxes in R and simply explained in general terms what they do and how to interpret their output.

I'm glad for completing this course, it added a value for me.I wish the videos about (SVD and PCA) in week 3 was more clear but it was difficult for understand and i feel lost , I think you need to update this videos to have more a satisfied materials.Thanks for your effort and for what i have learned for this course Learned a great deal.

Read more

plotting systems in r.

Good stuff just as I have come to expect from this University and the courses that are part of this Signature Track.A great deal of the lectures and work on assignments/quizzes/projects was learning and using the various plotting systems in R. Certainly this is important, but to put it into perspective, I spent four hours creating six plots for the final project, when I was able to use Tableau Desktop to create all six plots in under five minutes.So formally learning the data exploration techniques was good, but expect much of this course to be about learning the R plotting systems.That said, there is a point in this course (and the first time for all the courses to this point) where the topic suddenly got very, very technical.

A nice introduction to the three plotting systems in R. The second part is devoted to clustering, but it is not detailed enough to be really useful.

Very useful course Great course providing a good overview into the various plotting systems in R. I enjoyed the introduction to principal components analysis and singular value decomposition, but could have used more material to practise these methods Awesome!

This course teaches how to use three different plotting systems in R. Given the dominance of the tidyverse/ggplot2 paradigm, I really appreciate the opportunity to learn the base plotting system and the lattice plot system.

Read more

dimension reduction

This week should first start with a practical example/use of clustering and then move on to technical Very informative course, enhance dat I would have liked an assignment to focus on the clustering methods and I think dimension reduction was reviewed way too quick.

The only portion of the course that deviates from that is Week 3 (for which there is no quiz or project) where we "learn" about clustering and dimension reduction.

Great overview, especially the parts on dimension reduction.

A lot of things about Dimension Reduction and K-means method.

It was a wonderful experience to read the structure of data before delving into the advanced statistical levels of data analysis.The need for inclusion or exclusion of dependent variables or dimension reduction in regression analysis can be intuitively understood and visualized using Data Exploratory techniques and then we have the clue as what to do in the next level.It is like putting the whole characteristic of the data under full control.

Hopefully it could be clearer on dimension reduction.

I am eagerly awaiting the opportunity to apply clustering and dimension reduction on real data in future courses.

thank you This course covers plotting (base, lattice, ggplot) then takes a confusing tour into heavy topics of clustering and dimension reduction, then flips back to coloring in charts.

Read more

principal components analysis

Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to plotting with a section on color.

Only 'single value decomposition' and 'principal components analysis' was somewhat hard te grab and need a lot of extra research and study.

I found principal components analysis technique very useful.

Unfortunately, there are no interactive exercises or in-lecture quizzes and the principal components analysis and single value decomposition sections are too advanced for this course.

Read more

value decomposition

However, the videos on PCA (principal component analysis) and SVD (singular value decomposition) were difficult to understand, and I had to view several videos on YouTube (e.g., StateQuest or Standord U) that do a far better job of explaining.

If no linear algebra background is required, then why do you assume that I know what a singular value decomposition is?

One of the best parts is the introduction of Singular Value Decomposition and Principal Component Analysis.

However, I wish there was more hands-on or peer-graded practice with K-means, heatmaps, dendrograms, and dimension reduction techniques like Singular Value Decomposition (SVD).

Read more

two weeks

I still let myself take an extra two weeks to complete the final project because I was still learning and playing around with the plots and selection of data, but that was because I wanted to, not because I had to.

First two weeks are too repetitive with other courses Well crafted, carefully designed learning materials!

The course is quite good and informative in the first two weeks covering a lot of information and a lot of exercises.Week 3 is very unrelated and hard the videos and exercises are bad, and I had to do this part by myself again.Also when we get to the final course project doesn't cover any of these techniques.In my opinion, week 3 should be replaced with something more related to plotting systems and distributions, also one project would be enough.

The first two weeks deepen part of what mr.Peng used in his "Computing for Data Analysis" course.

The first two weeks are a chaotic overview of plotting in R. Three plotting systems are described but none of them is covered in enough detail.

The last two weeks seem to be just filler material, clustering and dimensionality reduction appear out of the blue but there are no quizzes or assignments about those topics so you could as well skip them too.

Read more

case studies

Week 4 consists of 2 case studies where the professor shows you how to perform an exploratory analysis on a couple different data sets.

The case studies are extremely helpful as well as the SWIRL exercises.

Love the case studies really interesting An excellent introduction to R' basic processing and graphing functionalities.

Read more

rather than

What do they are useful and why is it better to plot with bars rather than lines.

The course covers very limited subset of plots and mostly oriented to R-specific technical routines rather than overall approaches.

It should focus more on concepts and techniques for delivering richer and meaningful graphics using ggplot rather than talking that much about technicalities on the basic plot and lattice systems.

Read more

felt like

This is a very good course, at times it felt like the instruction was to do things mechanically without understanding the motivation.

I just wish the assessments had been a little more rigorous, as it felt like I could have done better but still passed the projects anyway.

A more in depth study of ggplot would probably be more beneficial as I felt like we were only scratching the surface with it Excellent course to develop and understand the technique of data visualization.

I felt like a learnt a lot especially with all of the projects Fantastic way to learn graphing systems in R and exploratory analysis This is a great course on plotting data as well as finding underlying patterns in it.

The swirl exercises kind of reproduce the lectures though- felt like it might not have been the most efficient use of time to go over the exact same example again.

Read more


An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Techniques (General) Engineer $70k

Adjunct Professor of Digital Media Techniques and Computer Graphic Design $70k

Senior Clinical Project Manager Portfolio; Translational Medicines/ Sciences, Exploratory Development $114k

Write a review

Your opinion matters. Tell us what you think.

Rating 4.5 based on 745 ratings
Length 5 weeks
Starts Jun 19 (50 weeks ago)
Cost $49
From Johns Hopkins University via Coursera
Instructors Roger D. Peng, PhD, Jeff Leek, PhD, Brian Caffo, PhD
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science Mathematics
Tags Data Science Data Analysis Probability And Statistics

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now