Save for later

Fundamentals of Scalable Data Science

Advanced Data Science with IBM,

Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models. In this course we teach you the fundamentals of Apache Spark using python and pyspark. We'll introduce Apache Spark in the first two weeks and learn how to apply it to compute basic exploratory and data pre-processing tasks in the last two weeks. Through this exercise you'll also be introduced to the most fundamental statistical measures and data visualization technologies. This gives you enough knowledge to take over the role of a data engineer in any modern environment. But it gives you also the basis for advancing your career towards data science. Please have a look at the full specialization curriculum: https://www.coursera.org/specializations/advanced-data-science-ibm If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge. To find out more about IBM digital badges follow the link ibm.biz/badging. After completing this course, you will be able to: • Describe how basic statistical measures, are used to reveal patterns within the data • Recognize data characteristics, patterns, trends, deviations or inconsistencies, and potential outliers. • Identify useful techniques for working with big data such as dimension reduction and feature selection methods • Use advanced tools and charting libraries to: o improve efficiency of analysis of big-data with partitioning and parallel analysis o Visualize the data in an number of 2D and 3D formats (Box Plot, Run Chart, Scatter Plot, Pareto Chart, and Multidimensional Scaling) For successful completion of the course, the following prerequisites are recommended: • Basic programming skills in python • Basic math • Basic SQL (you can get it easily from https://www.coursera.org/learn/sql-data-science if needed) In order to complete this course, the following technologies will be used: (These technologies are introduced in the course as necessary so no previous knowledge is required.) • Jupyter notebooks (brought to you by IBM Watson Studio for free) • ApacheSpark (brought to you by IBM Watson Studio for free) • Python We've been reported that some of the material in this course is too advanced. So in case you feel the same, please have a look at the following materials first before starting this course, we've been reported that this really helps. Of course, you can give this course a try first and then in case you need, take the following courses / materials. It's free... https://cognitiveclass.ai/learn/spark https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f8982db1-5e55-46d6-a272-fd11b670be38/view?access_token=533a1925cd1c4c362aabe7b3336b3eae2a99e0dc923ec0775d891c31c5bbbc68 This course takes four weeks, 4-6h per week

Get Details and Enroll Now

OpenCourser is an affiliate partner of Coursera and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 4.0 based on 328 ratings
Length 5 weeks
Starts Jul 3 (43 weeks ago)
Cost $79
From IBM, IBM Skills Network via Coursera
Instructor Romeo Kienzler
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science
Tags Data Science Data Analysis

Get a Reminder

Send to:

Similar Courses

What people are saying

data science

I'm really disapointed with the "Fundamentals of Scalable Data Science" course from IBM.

Great introduction to Data Science, IoT and scalable computing!

awesome course, got a good understanding of statistics in an intuitive manner.The main strength of this course is that, this course will help you to develop intuition of the whole data science concepts into the real world scenario.

Wonderful course This course is very recommended if you want to bring your Data Science skill to the next level.

The assignment is really challenging for me as the new comer in this Data Science world, but yeah, i finally can finished this course.

This course and certificate look like they should come after the first Data Science Certificate (9 courses).

This particular course could provide a lot more information and education on scalable data science.

Looking Forward for the next courses of the IBM Degree for advance data science Was really Good, I loved it ^_^ Some errors like lambdas are not working anymore with Python, some typos like in Assignment 4.1 and missing steps.

Good:Introduces Scalable Data Science and setting up the proper environments.Areas for Improvement:Not fully compatible with Python 3.5 (frustrating)Could use some pre-requisite course material such as basic SQL and walk-through of pyspark SQL DataFrames.Structure and presentation could be improved and reorganized.Instructions need proof-reading.

IBM cloud environment is buggy and inconsistent with lectures.When deploying services it sometimes fails and you are unable to remove them, rendering the account inoperable (as you have limits on free tier) Setup process is tedious Currently, it is not advisable to take this course.I have finished the excellent IBM Data Science Professional Certificate series on Coursera and wanted to improve my knowledge of scalable Data Science with this series.

Good for beginners in Data Science who have basic knowledge of python and SQL.

Excellent Course with very interesting assignment and informative video course Very good data science specialization covering many interesting advanced technologies!

Great introduction to Data Science on IBM Cloud.

Great way to understand and learn open source tools and latest IBM data science offerings.

Read more

apache spark

A perfect course to pace off with exploration towards sensor-data analytics using Apache Spark and python libraries.Kudos man.

I believe intense programming skills practise is more efficient Good introduction course to Apache Spark and its internals Very nice introduction I loved it !

I would appreciate some more in depth practical/technical information about IoT, also about apache spark and the overall mechanism of action in the real world.

I am happy that this course gave me my first practical experience with Apache Spark.

This course gives you nice experience with Apache Spark.

great introduction about Apache Spark and IBM Cloud Amazing course and especially instructor!!!

good one to start A really good introduction to Apache Spark.

Need more exercises related to wrangling data and manipulating SQL's with apache spark Simply not on the same level as other ML Courses on Coursera.

It gives great introduction into Apache Spark and its applications in real problems.

General into into how to deal with large data using Apache Spark Nice introduction, not too difficult without being so easy that you learn nothing.Sometimes outdated contents, but I always find solutions quickly to make everything work.

A very nice introduction to Apache Spark and it's environment.

This class make me confident in using apache spark for data projects that I may need.

A pretty good starter course for apache spark although the software version used in this course is outdated .

THE COURSE CONTENT WAS HELPUL FOR STARTING WITH APACHE SPARK AND PYTHON USING APACHE SPARK.

Read more

ibm cloud

Be careful when signing up for your IBM Cloud Instance and remember to shut it down when you're not using it.

The material along with the IBM cloud platform is a total bonus.The assignments are challenging for a reason.

This course is a very basic introduction to IBM cloud and general stats.

Overall the course is nice introduction to IBM cloud if one is interested.

In most cases, the version of IBM Cloud now being deployed varies from the course, so students have to figure it out themselves.

While getting used to IBM Cloud takes time, it is definitely a friendlier environment for data scientists and it removes the burden of setting up the infrastructure.

Good demonstration of basic stats using IBM Cloud & Spark.

Extremely unusable tutorials, extremely bad organization of the the materials, extremely bad accent, extremely unusable IBM cloud service, extremely outdated tutorial for environment setup, and you name it.

Major concern was to promote IBM cloud than to teach.

Read more

big data

Good overall,instructor was very good,but I feel more examples could be used especially when explaining multidimensional vector space and such basics of graphs First time I got the change to work on cloud data (big data).

Thanks to IBM Strong introduction into parallel computing and big data processing.

This is an excellent course, I had no previous experience with Big Data or Hadoop but this course helped me learn lot's of new technologies and also it helped me learn about big data.

very good enhancement in data science Good course, but assignments are a bit easy This course can be a bit tough at the start, especially if you (like me) are unfamiliar with big data, Hadoop and/or Spark.

I eventually took these free courses as it really helps strengthen your big data fundamentals, including RDD, HDFS and Spark.Assignments are ok, definitely doable and easier than they could have been.

I learned satisfactory tools and ideas about how to handle big data.

Great Introductory course for Big Data Analytics.

pretty good course for a beginner new to big data analysis.

Big data materials are less discussed specially coding sections Really a good course for Data Science Programming.

I now understand how to deal with big data using Spark which is exactly what I wanted.

A detailed explanation on the trade off of different approaches that can be used in Big Data but there is not enough examples of manipulating big datasets As an advanced course, the concepts here are pretty basic.

Read more

for beginners

I like it for beginners.

It's a real basic course and good for beginners, though you need to have to dive into Python and Spark on your own to follow the course and the assignments.

thanks for the course I will not recommend it for beginners.

Good guidance and a great start up for beginners as well a beneficial during this Covid-19 excellent organized course best course Amazing Intro to Apache spark.

easier to just make it labs and some reading as all the videos are just watching the instructor type code The course is perfect for beginners but some videos are old.

Very well organised material...Really liked the concept of dimension reduction and PCS.Suggestion:- It is not for beginners so you may modify it as intermediate course.. Actually I find it advanced level course and I had gone through Spark Programming Fundamentals as you said.. And thus I was able to complete it...

Read more

set up

Getting everything set up correctly is not very user friendly at this stage.

Python 2 is used through this course and the instructions of how to set up Node-RED and Cloudant do not work.

My suggestion would be to give a more detailed explanation of the cloud/parallel computing, how it's structured, how to set up servers, etc.

At first, I'm not sure what to do and it is hard for me to set up environment.

Read more

watson studio

The course touches a number of components of IBM Cloud platform, that includes IBM Watson Studio (online software development platform) and Node-RED (a flow based programming language for defining data flows).

It's an excellent course for anybody who wants to learn the basic of Spark, Watson Studio, and data analysis.

The best part is the programming assignment and tutorials: great hands-on introduction to IBM Watson Studio with manageable examples.

It is surprising that topics have not been updated after many comments in the discussion forum.Overall for me, it was a great experience and great learning experience I learned using Spark with RDDs on IBM Watson Studio ? Great Really liked this course.

Read more

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

AH-64 Apache Electrician / Armament $54k

Core Java/Spring with Apache Camel $57k

AH-64D Apache Mechanic $60k

Apache Spark Developer $64k

IBM Kenexa Recruiter $70k

Assistant Oracle/Apache Admin $70k

IBM API Connect $71k

IBM Business Data Analyst $74k

Contributor, Apache Traffic Server $81k

IBM FileNet Architect $90k

IBM Systems Programmer 3 $102k

Apache CloudStack PMC (Project Management Committee) $157k

Write a review

Your opinion matters. Tell us what you think.

Rating 4.0 based on 328 ratings
Length 5 weeks
Starts Jul 3 (43 weeks ago)
Cost $79
From IBM, IBM Skills Network via Coursera
Instructor Romeo Kienzler
Download Videos On all desktop and mobile devices
Language English
Subjects Data Science
Tags Data Science Data Analysis

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now