Save for later

Data Science Capstone

Data Science,

The capstone project class will allow students to create a usable/public data product that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.
Get Details and Enroll Now

OpenCourser is an affiliate partner of Coursera and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 4.3 based on 246 ratings
Length 8 weeks
Effort 4-9 hours/week
Starts Jul 3 (48 weeks ago)
Cost $49
From Johns Hopkins University via Coursera
Instructors Roger D. Peng, PhD, Jeff Leek, PhD, Brian Caffo, PhD
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science Programming
Tags Data Science Data Analysis Machine Learning

Get a Reminder

Send to:

Similar Courses

What people are saying

natural language processing

Well, most of the class was just about learning natural language processing (NLP), which wasn't covered.

natural language processing, markov models, etc.

A very tough and challenging project, but a great way to learn a lot about Natural Language Processing and algorithm coding in R, and in the end to have a cool Shiny app to add to your portfolio.

It also gives the student a glimpse of what Data Science in real life is and touches on Natural Language Processing as part of AI.

Fantastic exposure to Natural Language Processing!

The Capstone Project makes you summarizes what you have learnt so far and take it to the next level, natural language processing .

Your final product will be displayed for everyone via ShinyApps and a presentation using R Presentation (also published via RPubs).On a(nother) negative note, the topic of Natural Language Processing is not an easy one to just walk into and feel confident in providing a working next-word prediction algorithm in about eight (8) weeks.

I had no experience in natural language processing before I took this course, and now I'm kind of in love with it!

Discussion Forum provides some level of help but you are basically on your own.Very challenging to come up to speed with Natural Language Processing techniques if you have never taken any class about it.My recommendation to JHU and Coursera is to add a separate course for NLP where you cover all the basics and then have the Capstone.

Instead, it threw us into a completely new area, Natural Language Processing.

Also, the topic (Natural Language Processing) is just too unconnected to anything seen in the other courses.

Read more

data science specialization

So disappointing, it feels somewhat unrelated to the material covered in the 9 courses in the Data Science Specialization, so I didn't feel adequately prepared for tackling the Capstone even though I carefully completed all pre-req courses.

While I did my own write-ups and wrote my own code, I benefited in a big way from lessons learned by others who've already tackled similar problems.I would recommend the Data Science Specialization by JHSU, which (as it should be) is a package deal with the capstone project.

The course learner community was supportive, which is fortunately typical for Coursera.All in all, this project was *not* an effective capstone for the Data Science specialization.

Well-paced, highly structured Capstone that allowed me to put to the test the skills I honed during the 9 previous courses in the JHU Data Science specialization.

Feeling proud after completing all the courses under Data Science Specialization.

NLP module should definitely be included into JHU Data Science specialization.

Cool Brilliant course, the final chapter for the data science specialization.

In my opinion this last course is a great way to conclude the Data Science specialization, because not only it "forces" you to apply a lot of lessons learned during the other 10 courses, but also because it gives you the opportunity to understand how important is to set the problem in a good way before trying to solve it.

Read more

learn a lot

Lastly, I really hate RPresenter and that the instructors force us to use it, but maybe that's just me.On the positive side, I did learn a lot: The basics of text prediction, how to do parallel programming in R and how to set up an RStudio instance on AWS (the latter two are not very hard, I recommend them to anyone struggling with gigantic runtimes, as long as you're willing to invest like $40 or so for the computing power).

Had to learn a lot on our own but very valuable content once acquired.

Thanks for all the support Like diving in without learning to swim first - but man did I learn a lot.

With this project, you learn a lot!

One can learn a lot ,doing this project It is a lot of independent work, with guiding questions but no real help otherwise.

Read more

data scientist

The assigned project is quite unexpected but it really tests on the skills of an aspiring data scientist!

I get that you will always see new data formats as a data scientist, but having the whole course cover numeric data and then having the final project be on text data where you can't apply what you learned seems sub-optimal.

Great course for becoming a data scientist Interesting assignment!

Really challenging but satisfying enough!Thank you for Cousera team who patiently developed such a beautiful program for upskilling us, the so-called data scientist!

I am looking for a new Data Scientist career ( did this specialization to get new knowledge about Data Science and better understand the technology and your practical applications.

Learnt a ton about various NLP algorithms for anyone who aspires to be a Data Scientist !

In that sense it was probably good prep for unexpected challenges in the workplace and therefore good training to make us real data scientists.

You are a data scientist now, be ready to deal with new analyses and new topics.

Read more

previous courses

The negative is that there is no explanation whatsoever about NLP, which was never mentioned in the previous courses, so there's not much teaching or guidance.

LOLI have a software development background (and completed the previous courses in the specialization), so translating approaches I found described in various sources into code wasn't "easy"; but it wasn't a barrier, either.

Also, in my opinion the materials / resources given to this course are scarce compared with previous courses of the specialization.

Self pace and with a lot examples and discussion forum support This class was a huge challenge for me, but it pushed me to learn a whole lot and practice many of the skills that I had learned in previous courses!

On a positive note, you will use all of the skills from the previous courses: writing R functions, performing exploratory analysis and publishing it via RPubs.

Looking at it in perspective, I think the previous courses are absolutely necessary for the final project.

Very informative and beneficial course after going through all the previous courses.

Read more

machine learning

I think a machine learning project that tied together everything that we'd worked on up until this point would have been a lot more fun and rewarding.

This course significantly challenged my skills in programming, probability, machine learning and applied mathematics (eg Katz's backoff theory-equations).

The instructors (specially Peng) spent way too much time detailing R syntax that could have been picked up by the students on their own from other resources available on the web...The regression models and statistical inference courses are exceptions though: Together with the machine learning course, these are probably the most useful from the whole specialization.The materials in this capstone project are way sloppier than materials in other courses by the way.

Also, most of what we learned in the first 9 courses about statistics and machine learning turned out to be irrelevant to the capstone project.

I would prefer a a large scale machine learning capstone where we could make models and it would fit better to real life situation!

The capstone project doesn't fully utilise d knowledge from earlier modules such as Machine Learning, statistical analysis, regression models n etc.

Statistical inference was necessary and closely linked to exploratory analysis, especially to select samples well and review distributions, since some machine learning methods may be affected by distributions.

This capstone did not require material from key courses, specifically the machine learning, regression models, and statistical inference courses.

It would have been nice to have had a machine learning component as well, but that would have likely made the course even longer and more difficult to grade.

Read more

real world

nice course, and the project task is a quite interesting one Very instructive, since it presents you with a real world problem, that you need to solve by yourself, in all of its complexity.

It resembled a real world task, where an idea is presented and it is up to the user to research methods and processes for the best outcome.

Very Dirty Data Sources made it a very Real World problem to solve.

Having completed this project, I feel more confident about my skills as a data scientist in solving real world problems.

Some of my fellow learners complained about the new data type and little information provided, but I feel this is a good simulation of real world experience as a data scientist!

Read more

learn something

The positive is that it introduces you to a new topic (NLP) and the goal is reasonable, it takes a lot of effort but it's not impossible and it forces you to learn something meaningful (something easier would have not made me learn something valuable).

Whenever I learn something I believe to be useful, I always wonder how it applies in other contexts.

It's a bit tough since topic in NLP and we haven't discussed much that in previous courses, but you will learn something new and apply the knowledge you gained in the specialization.

Thank you so much for the guidance the opportunity to learn something new!!!

Read more

my model

I ended up building my model in Python, exporting it as JSON, and then importing that into my Shiny app.

Not because my model was wrong (I was able to implement it and to check it against some hand-written and proved examples - which I should probably thank again), but because I was not able to make it run efficiently enough for the given constraints.Being stuck in this stage for longer than I wanted, I had to sacrifice another important steps of data analysis pipeline in order to not jeopardize my final delivery by not meeting the final due date.

It turns out that my model trained on a 25% sample performed just as well as a model trained on 100%.

Both helped me learn how I can improve my model and demo application.

The pre-processing of the data was quite extensive (9 steps before generating the ngram tables I used in my model) and was the key to getting decent results IMHO, but one had to step on a quite a few landmines to figure this out.

Read more


An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Online Faculty - Multimedia Capstone $27k

Capstone Advisor, Masters in Real Estate $57k

Assistant Capstone Project Manager $58k

Postdoctoral Capstone Researcher $59k

Capstone Experience Researcher $64k

Capstone Project Manager $73k

Planner/Project Manager- Capstone Experience $86k

Project Architect, Project Manager $94k

Senior Capstone Researcher $99k

Industrial Engineer Capstone $103k

Capstone Project Mechanical Engineer $131k

Project Manager, Project Engineering $143k

Write a review

Your opinion matters. Tell us what you think.

Rating 4.3 based on 246 ratings
Length 8 weeks
Effort 4-9 hours/week
Starts Jul 3 (48 weeks ago)
Cost $49
From Johns Hopkins University via Coursera
Instructors Roger D. Peng, PhD, Jeff Leek, PhD, Brian Caffo, PhD
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science Programming
Tags Data Science Data Analysis Machine Learning

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now