We may earn an affiliate commission when you visit our partners.
Course image
Romeo Kienzler, Rav Ahuja, Steve Ryan, Aije Egwaikhide, Ramesh Sannareddy, and Karthik Muthuraman

Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.

Read more

Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.

Starting with

, this course introduces you to NoSQL fundamentals, including the four key non-relational database categories. By the end of the course, you will have hands-on skills working with MongoDB, Cassandra, and IBM Cloudant NoSQL databases.

A crucial aspect of data engineering is the acquisition and management of Big Data and Big Data Analytics scalability and performance. When you enroll in

, you'll discover the characteristics, features, benefits, limitations, and applications of some of the more popular Big Data processing tools. You explore the open-source ecosystem of Apache tools, including Apache Hadoop, Apache Hive, and Apache Spark, including Spark on Kubernetes. Discover how to leverage Spark to deliver reliable insights. You'll gain hands-on data analysis skills using PySpark and Spark SQL and create a streaming analytics application using Spark Streaming, and more.

Then enroll in

to discover how data and machine learning engineers use Spark Structured Streaming, GraphFrames, Regression, Classification, and clustering. Learn about clustering and how to apply the k-means clustering algorithm using Spark MLlib. Extraction Transformation and Loading, (ETL) is at the heart of data and machine learning engineering, and you'll gain skills using Spark to perform extract, transform and load (ETL) tasks. This course culminates with a hands-on Spark project.

This Professional Certificate does not require any prior programming or data science skills; however, prior basic data literacy and SQL skills will prove valuable in completing this program.

What you'll learn

  • Differentiate between the four main categories of NoSQL repositories and work hands-on with MongoDB, Cassandra and IBM Cloudant.
  • Apply your knowledge of the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools, including Hadoop, HDFS, Hive and HBase.
  • Describe parallel programming using Resilient Distributed Datasets (RDDs), DataFrames and SparkSQL. Understand how Catalyst and Tungsten benefit Spark programmer and see how ETL work using DataFrames.
  • Acquire real-world data engineering and machine learning skills using Spark Structured Streaming, DataFrames, GraphFrames, Spark ML, Regression, Classification, and clustering, including the k-means algorithm and ETL using Spark.
  • Gain hands-on experience using SparkSQL, Apache Spark on IBM Cloud.
  • Learn about scaling out using the IBM Spark Environment in Watson Studio, running Spark on Kubernetes, setting Spark configurations, and performing monitoring and performance tuning.

Share

Help others find this collection page by sharing it with your friends and followers:

What's inside

Three courses

NoSQL Database Basics

(12 hours)
This course provides technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained relevance in the database landscape. Their main advantage is effectively handling scalability and flexibility issues raised by modern applications.

Big Data, Hadoop, and Spark Basics

(15 hours)
Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data to identify behaviors and preferences. This course introduces you to Big Data concepts and practices, including Hadoop, Hive, and Spark.

Apache Spark for Data Engineering and Machine Learning

(7 hours)
Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.

Save this collection

Save NoSQL, Big Data and Spark Fundamentals to your list so you can find it easily later:
Save
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser