NoSQL, Big Data and Spark Fundamentals

Romeo Kienzler, Rav Ahuja, Steve Ryan, Aije Egwaikhide, Ramesh Sannareddy, and Karthik Muthuraman

Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.

Starting with

, this course introduces you to NoSQL fundamentals, including the four key non-relational database categories. By the end of the course, you will have hands-on skills working with MongoDB, Cassandra, and IBM Cloudant NoSQL databases.

Data engineers and Big Data professionals are in overwhelming demand. NoSQL and Big Data technology skills such as Apache Spark are a must-have for modern day data-driven decision-making. This three-course Professional Certificate from IBM opens the door for data engineering and big data careers.

Starting with

, this course introduces you to NoSQL fundamentals, including the four key non-relational database categories. By the end of the course, you will have hands-on skills working with MongoDB, Cassandra, and IBM Cloudant NoSQL databases.

A crucial aspect of data engineering is the acquisition and management of Big Data and Big Data Analytics scalability and performance. When you enroll in

, you'll discover the characteristics, features, benefits, limitations, and applications of some of the more popular Big Data processing tools. You explore the open-source ecosystem of Apache tools, including Apache Hadoop, Apache Hive, and Apache Spark, including Spark on Kubernetes. Discover how to leverage Spark to deliver reliable insights. You'll gain hands-on data analysis skills using PySpark and Spark SQL and create a streaming analytics application using Spark Streaming, and more.

Then enroll in

to discover how data and machine learning engineers use Spark Structured Streaming, GraphFrames, Regression, Classification, and clustering. Learn about clustering and how to apply the k-means clustering algorithm using Spark MLlib. Extraction Transformation and Loading, (ETL) is at the heart of data and machine learning engineering, and you'll gain skills using Spark to perform extract, transform and load (ETL) tasks. This course culminates with a hands-on Spark project.

This Professional Certificate does not require any prior programming or data science skills; however, prior basic data literacy and SQL skills will prove valuable in completing this program.

What you'll learn

Differentiate between the four main categories of NoSQL repositories and work hands-on with MongoDB, Cassandra and IBM Cloudant.
Apply your knowledge of the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools, including Hadoop, HDFS, Hive and HBase.
Describe parallel programming using Resilient Distributed Datasets (RDDs), DataFrames and SparkSQL. Understand how Catalyst and Tungsten benefit Spark programmer and see how ETL work using DataFrames.
Acquire real-world data engineering and machine learning skills using Spark Structured Streaming, DataFrames, GraphFrames, Spark ML, Regression, Classification, and clustering, including the k-means algorithm and ETL using Spark.
Gain hands-on experience using SparkSQL, Apache Spark on IBM Cloud.
Learn about scaling out using the IBM Spark Environment in Watson Studio, running Spark on Kubernetes, setting Spark configurations, and performing monitoring and performance tuning.

Enroll now

What's inside

Three courses

NoSQL Database Basics

(12 hours)

This course provides technical hands-on knowledge of NoSQL databases and Database-as-a-Service (DaaS) offerings. With the advent of Big Data and agile development methodologies, NoSQL databases have gained relevance in the database landscape. Their main advantage is effectively handling scalability and flexibility issues raised by modern applications.

View more details for this course

Big Data, Hadoop, and Spark Basics

(15 hours)

Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data to identify behaviors and preferences. This course introduces you to Big Data concepts and practices, including Hadoop, Hive, and Spark.

View more details for this course

Apache Spark for Data Engineering and Machine Learning

(7 hours)

Apache® Spark™ is a fast, flexible, and developer-friendly open-source platform for large-scale SQL, batch processing, stream processing, and machine learning. Users can take advantage of its open-source ecosystem, speed, ease of use, and analytic capabilities to work with Big Data in new ways.

View more details for this course

Save this collection

Create your own learning path. Save this collection to your list so you can find it easily later.

Save

NoSQL, Big Data and Spark Fundamentals

Share

What's inside

Three courses

NoSQL Database Basics

Big Data, Hadoop, and Spark Basics

Apache Spark for Data Engineering and Machine Learning

Save this collection