We may earn an affiliate commission when you visit our partners.
Course image
Romeo Kienzler

Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models.

Read more

Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models.

In this course we teach you the fundamentals of Apache Spark using python and pyspark. We'll introduce Apache Spark in the first two weeks and learn how to apply it to compute basic exploratory and data pre-processing tasks in the last two weeks. Through this exercise you'll also be introduced to the most fundamental statistical measures and data visualization technologies.

This gives you enough knowledge to take over the role of a data engineer in any modern environment. But it gives you also the basis for advancing your career towards data science.

Please have a look at the full specialization curriculum:

https://www.coursera.org/specializations/advanced-data-science-ibm

If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge. To find out more about IBM digital badges follow the link ibm.biz/badging.

After completing this course, you will be able to:

• Describe how basic statistical measures, are used to reveal patterns within the data

• Recognize data characteristics, patterns, trends, deviations or inconsistencies, and potential outliers.

• Identify useful techniques for working with big data such as dimension reduction and feature selection methods

• Use advanced tools and charting libraries to:

o improve efficiency of analysis of big-data with partitioning and parallel analysis

o Visualize the data in an number of 2D and 3D formats (Box Plot, Run Chart, Scatter Plot, Pareto Chart, and Multidimensional Scaling)

For successful completion of the course, the following prerequisites are recommended:

• Basic programming skills in python

• Basic math

• Basic SQL (you can get it easily from https://www.coursera.org/learn/sql-data-science if needed)

In order to complete this course, the following technologies will be used:

(These technologies are introduced in the course as necessary so no previous knowledge is required.)

• Jupyter notebooks (brought to you by IBM Watson Studio for free)

• ApacheSpark (brought to you by IBM Watson Studio for free)

• Python

We've been reported that some of the material in this course is too advanced. So in case you feel the same, please have a look at the following materials first before starting this course, we've been reported that this really helps.

Of course, you can give this course a try first and then in case you need, take the following courses / materials. It's free...

https://cognitiveclass.ai/learn/spark

https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f8982db1-5e55-46d6-a272-fd11b670be38/view?access_token=533a1925cd1c4c362aabe7b3336b3eae2a99e0dc923ec0775d891c31c5bbbc68

This course takes four weeks, 4-6h per week

Enroll now

What's inside

Syllabus

Introduction the course and grading environment
Tools that support BigData solutions
Scaling Math for Statistics on Apache Spark
Read more
Data Visualization of Big Data

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Strengthens the foundation of data processing and analytics with Apache Spark for a successful transition to data science
Develops expertise in statistical measures and data visualization techniques for insightful analysis
Provides hands-on experience with Jupyter notebooks and Apache Spark, prevalent tools in industry
Taught by Romeo Kienzler, an instructor recognized in the field of data science
Emphasizes practical applications, preparing learners for real-world data analysis scenarios
Serves as the foundation for the IBM Advanced Data Science Specialization, providing a comprehensive learning path

Save this course

Save Fundamentals of Scalable Data Science to your list so you can find it easily later:
Save

Reviews summary

Advanced big data analytics with spark

Learners say the Fundamentals of Scalable Data Science course is largely positive, with engaging assignments and a strong focus on the fundamentals of data science. It provides a solid foundation in Apache Spark, statistics, and data visualization. However, some reviewers have noted that the materials and videos are outdated and could benefit from updates.
Features engaging assignments that help learners apply their knowledge.
"I really enjoyed your videos and your enthusiasm - I have been inspired."
"The Activities were very well structured although there was some confusion initially when tasks needed SQL but the videos showed RDD solutions."
"This course introduced me to working of Spark and different data science principles, it was a great discovery to find this course."
Provides a solid foundation in Apache Spark, statistics, and data visualization.
"Great course that covers some of the fundamentals of advanced data analytics."
"I learned a good amount about Apache Spark, IBM Watson, and integrating both with Python."
"This course introduced me to working of Spark and different data science principles, it was a great discovery to find this course."
Instructor is knowledgeable and presents the material in a clear and engaging way.
"Romeo, your an amazing human being!!"
"Excellent Instructions by Mr. Romeo."
"I also have to say that I really enjoyed the Rhodes Scholar program at Oxford and the 2012 conference in Rio."
Materials and videos are outdated and could benefit from updates.
"The videos are fuzzy, extremely outdated, and don’t match up with the actual projects."
"The course is outdated, there is no explanation about watsonx and cloud pak, many urls are broken."
"I'm really disapointed with the "Fundamentals of Scalable Data Science" course from IBM. The videos are referring to an outdated software releases."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Fundamentals of Scalable Data Science with these activities:
Organize Your Course Materials
Establish a strong foundation for learning by organizing your course materials.
Show steps
  • Create folders for different topics and assignments
  • File lecture notes, readings, and assignments in the appropriate folders
  • Digitize and store important materials
  • Establish a consistent naming convention
Review Elementary Statistics Concepts
Refresh your knowledge of basic statistics to enhance comprehension of course material.
Browse courses on Statistics
Show steps
  • Review descriptive statistics (mean, median, mode, standard deviation)
  • Refamiliarize yourself with inferential statistics (hypothesis testing, confidence intervals)
  • Practice solving basic statistics problems
Practice Python Programming
Sharpen your Python skills to maximize comprehension of code examples and exercises.
Browse courses on Python
Show steps
  • Review Python syntax and data structures
  • Solve coding challenges and exercises
  • Build a small Python project
Six other activities
Expand to see all activities and additional details
Show all nine activities
Follow Apache Spark Tutorials
Expand your knowledge of Apache Spark by exploring online tutorials.
Browse courses on Apache Spark
Show steps
  • Search for Apache Spark tutorials
  • Choose tutorials that align with your learning goals
  • Follow the tutorials step-by-step
  • Practice the concepts learned in the tutorials
Build a Simple Apache Spark Application
Develop a basic understanding of Apache Spark by creating a simple application.
Browse courses on Apache Spark
Show steps
  • Set up your development environment
  • Create a Spark application
  • Load data into Spark
  • Transform and analyze the data
  • Save the results
Solve Apache Spark Exercises
Test and strengthen your Apache Spark skills by solving exercises.
Browse courses on Apache Spark
Show steps
  • Search for Apache Spark exercises
  • Select exercises that cover different concepts and scenarios
  • Solve the exercises independently
  • Review your solutions and identify areas for improvement
Join a Study Group for Apache Spark
Deepen your understanding of Apache Spark through collaboration and knowledge sharing.
Browse courses on Apache Spark
Show steps
  • Find a study group or create your own
  • Meet regularly to discuss course material, share ideas, and solve problems
  • Review lecture notes, readings, and exercises together
  • Provide constructive feedback and support to group members
Create a Data Visualization Dashboard
Enhance your understanding of data visualization techniques by creating a dashboard.
Browse courses on Data Visualization
Show steps
  • Gather and clean the data
  • Choose the appropriate visualization tools
  • Design and create the dashboard
  • Present and share your dashboard
Contribute to Apache Spark Projects
Deepen your understanding of Apache Spark and contribute to the community by participating in open source projects.
Browse courses on Apache Spark
Show steps
  • Identify Apache Spark projects that align with your interests
  • Read the project documentation and familiarize yourself with the codebase
  • Make small contributions, such as bug fixes or documentation updates
  • Collaborate with other contributors on larger features or enhancements

Career center

Learners who complete Fundamentals of Scalable Data Science will develop knowledge and skills that may be useful to these careers:
Data Analyst
A Data Analyst collects, analyzes, interprets, and presents data to help organizations make informed decisions. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Data Engineer
A Data Engineer designs, builds, and maintains data pipelines and databases. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data ingestion, data transformation, and data analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Data Scientist
A Data Scientist uses data to solve business problems. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Machine Learning Engineer
A Machine Learning Engineer builds and deploys machine learning models. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data preparation, feature engineering, and model training. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Statistician
A Statistician collects, analyzes, interprets, and presents data to help organizations make informed decisions. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Business Analyst
A Business Analyst uses data to identify and solve business problems. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Financial Analyst
A Financial Analyst uses data to make investment decisions. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Market Research Analyst
A Market Research Analyst collects, analyzes, and interprets data to help businesses understand their customers. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Operations Research Analyst
An Operations Research Analyst uses data to improve the efficiency of business operations. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis. This course can also help you build a portfolio of projects that you can use to showcase your skills to potential employers.
Product Manager
A Product Manager is responsible for the development and launch of new products. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis to better understand your customers and their needs.
Software Engineer
A Software Engineer designs, develops, and maintains software systems. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data ingestion, data transformation, and data analysis.
Web Developer
A Web Developer designs and develops websites. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis to better understand your website's visitors.
Data Visualization Engineer
A Data Visualization Engineer designs and develops data visualizations. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis to create beautiful and informative data visualizations.
Data Architect
A Data Architect designs and manages data systems. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data ingestion, data transformation, and data analysis.
Data Governance Analyst
A Data Governance Analyst develops and implements data governance policies and procedures. This course can help you develop the skills needed to succeed in this role by providing a foundation in Apache Spark, a popular tool for working with big data. You will learn how to use Apache Spark to perform data exploration, data visualization, and statistical analysis to identify and address data governance issues.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Fundamentals of Scalable Data Science.
Provides a comprehensive overview of Apache Spark, covering its architecture, programming model, and use cases. It valuable resource for both beginners and experienced Spark users.
Provides a deep dive into the internals of Apache Spark. It covers topics such as memory management, scheduling, and performance optimization.
Provides a comprehensive overview of data visualization. It covers a wide range of topics, from data visualization principles to data visualization techniques.
Provides a comprehensive overview of Python for data analysis. It covers a wide range of topics, from data manipulation to data visualization.
Provides a comprehensive overview of SQL for data analysis. It covers a wide range of topics, from SQL basics to SQL advanced concepts.
Provides a comprehensive overview of big data analytics. It covers a wide range of topics, from data engineering to data science.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Fundamentals of Scalable Data Science.
AI Workflow: AI in Production
Scalable Machine Learning on Big Data using Apache Spark
Python for Data Engineering Project
Advanced Machine Learning and Signal Processing
Advanced Linear Models for Data Science 1: Least Squares
Advanced Linear Models for Data Science 2: Statistical...
Python Project for Data Engineering
AI Workflow: Feature Engineering and Bias Detection
R Data Science Capstone Project
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser