We may earn an affiliate commission when you visit our partners.
Durga Viswanatha Raju Gadiraju, Madhuri Gadiraju, Sathvika Dandu, Pratik Kumar, Sai Varma, Phani Bhushan Bozzam, and Siva Kalyan Geddada

As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.

About Data Engineering

Read more

As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.

About Data Engineering

Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.

I have prepared this course for anyone who would like to transition into a Data Engineer role using Spark (Scala). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.

Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.

Setup of Single Node Big Data Cluster

Many of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.

  • Setup Ubuntu-based AWS Cloud9 Instance with the right configuration

  • Ensure Docker is setup

  • Setup Jupyter Lab and other key components

  • Setup and Validate Hadoop, Hive, YARN, and Spark

Are you feeling a bit overwhelmed about setting up the environment? Don't worry. We will provide complementary lab access for up to 2 months. Here are the details.

  • Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.

A quick recap of Scala

This course requires a decent knowledge of Scala. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Scala. If you are not familiar with Scala, then we suggest you go through relevant courses on Scala as Programming Language.

Data Engineering using Spark SQL

Let us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.

  • Getting Started with Spark SQL

  • Basic Transformations using Spark SQL

  • Managing Spark Metastore Tables - Basic DDL and DML

  • Managing Spark Metastore Tables Tables - DML and Partitioning

  • Overview of Spark SQL Functions

  • Windowing Functions using Spark SQL

Data Engineering using Spark Data Frame APIs

Spark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.

  • Data Processing Overview using Spark Data Frame APIs leveraging Scala as Programming Language

  • Processing Column Data using Spark Data Frame APIs leveraging Scala as Programming Language

  • Basic Transformations using Spark Data Frame APIs leveraging Scala as Programming Language - Filtering, Aggregations, and Sorting

  • Joining Data Sets using Spark Data Frame APIs leveraging Scala as Programming Language

All the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

Enroll now

Here's a deal for you

We found an offer that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

What's inside

Syllabus

Introduction
CCA 175 Spark and Hadoop Developer - Curriculum
Set up self support lab to prepare for CCA 175 Certification on AWS using Cloud9
Getting Started with Cloud9
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Covers building data engineering pipelines using Spark SQL and Spark Data Frame APIs, which are essential for processing data at scale in modern data architectures
Includes setting up a single-node big data cluster, which is helpful for those transitioning from conventional technologies and lacking access to existing big data environments
Requires a decent knowledge of Scala and includes a module to quickly warm up with Scala, but suggests relevant courses for those not familiar with the language
Provides complementary lab access for hands-on practice, enhancing the learning experience and skill development in a practical environment
Teaches Spark 2 and Spark 3, which may require learners to manage multiple versions of Spark and understand the differences between them
Includes content related to CCA 175 Spark and Hadoop Developer certification, which has been sunset, so some content may be less relevant to current industry practices

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Practical spark sql and dataframes with scala

According to learners, this course offers a solid foundation in Spark SQL and DataFrames using Scala, primarily designed for those aiming for Data Engineering roles. Many students highlighted the hands-on labs and exercises as a major strength, providing practical experience essential for application. The course provides detailed guidance for setting up a Big Data environment, a step some found complex but necessary. It effectively covers the essentials of Spark for data processing. Reviewers consistently noted the importance of having a decent prior understanding of Scala, as the introductory section is brief. While generally well-received for its practical focus, some reviews suggested certain parts might feel slightly outdated. Overall, it is viewed as a valuable course for getting started with Spark.
Strong Scala skills are recommended.
"You really need a decent understanding of Scala before taking this course."
"The Scala warm-up section is too brief if you're not already familiar."
"Wish I had stronger Scala skills going into this, it would have helped."
"Recommends a good grasp of Scala fundamentals, which I found necessary."
Detailed but can be challenging.
"Setting up the environment was quite involved, taking significant time."
"The instructions for Cloud9 setup were helpful, although troubleshooting was needed at times."
"Environment setup felt like the hardest part, requiring patience."
"Setting up the lab environment is crucial and well-documented, but be prepared for potential issues."
Geared towards practical application.
"The course has a very practical, hands-on approach."
"Liked that it focuses on applying Spark to real-world tasks."
"Great for learning how to use Spark for actual data processing jobs."
"The course is highly practical and focused on implementation."
Good introduction to core Spark concepts.
"Provides a solid introduction to Spark SQL and DataFrames."
"I got a good understanding of the basic Spark operations needed for data engineering."
"Covers the essentials of Spark DataFrames effectively."
"The course material explains fundamental Spark concepts clearly."
Provides essential practical experience.
"The hands-on labs were the most valuable part of the course for me."
"I learned so much by actually doing the exercises in the labs."
"The emphasis on hands-on coding and labs made the concepts stick better."
"Practical application through labs is a major strength here."
Some parts feel a bit old.
"Some sections felt a bit outdated compared to current industry practices."
"While it mentions Spark 3, parts of the course seem based on older versions or approaches."
"Could use some updates to reflect the latest Spark features and best practices."
"Some material felt slightly behind the curve."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Spark SQL and Spark 3 using Scala Hands-On with Labs with these activities:
Review Scala Fundamentals
Reviewing Scala fundamentals will ensure a smoother learning experience when applying Spark Data Frame APIs.
Show steps
  • Review basic syntax and data types.
  • Practice writing simple Scala functions.
  • Familiarize yourself with Scala collections.
Practice Basic HDFS Commands
Practicing HDFS commands will help you manage data within the Hadoop environment used by Spark.
Show steps
  • Practice listing, creating, and deleting directories.
  • Practice copying files between local and HDFS.
  • Practice checking file metadata and storage usage.
Read "Learning Spark"
Reading "Learning Spark" will provide a deeper understanding of the underlying concepts and best practices for using Spark SQL and DataFrames.
Show steps
  • Read the chapters related to Spark SQL and DataFrames.
  • Work through the examples provided in the book.
  • Compare the book's examples with the course's labs.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Implement Spark SQL Queries
Practicing Spark SQL queries will reinforce your understanding of SQL syntax and its application within the Spark environment.
Show steps
  • Write SQL queries to filter, aggregate, and sort data.
  • Experiment with different Spark SQL functions.
  • Optimize query performance using techniques learned in the course.
Build a Simple Data Pipeline
Building a data pipeline will allow you to apply your knowledge of Spark SQL and DataFrames to a real-world problem.
Show steps
  • Define the data source and target for the pipeline.
  • Implement data extraction, transformation, and loading (ETL) using Spark SQL or DataFrames.
  • Test and validate the pipeline's functionality.
  • Document the pipeline's design and implementation.
Write a Blog Post on Spark Optimization
Writing a blog post will help you consolidate your knowledge of Spark optimization techniques and share your insights with others.
Show steps
  • Research different Spark optimization techniques.
  • Choose a specific optimization technique to focus on.
  • Write a clear and concise explanation of the technique.
  • Provide examples of how to apply the technique in practice.
Read "Spark: The Definitive Guide"
Reading "Spark: The Definitive Guide" will provide a more in-depth understanding of Spark's capabilities and advanced features.
Show steps
  • Read the chapters related to advanced Spark SQL features.
  • Explore the book's examples of complex data transformations.
  • Compare the book's recommendations with your own experiences.

Career center

Learners who complete Spark SQL and Spark 3 using Scala Hands-On with Labs will develop knowledge and skills that may be useful to these careers:
Data Engineer
A Data Engineer designs, builds, and manages data pipelines, and this course is directly relevant to this role. Data Engineers transform data into a format that is more useful for analysis. This course helps build a foundation in using Spark SQL and Spark Data Frame APIs with Scala, which are core technologies for data engineering pipelines. You can learn how to process data at scale, set up a big data cluster, and use tools like Hadoop, Hive, and YARN. If you want to become a Data Engineer, this course will prepare you, especially if you want to use Spark and Scala in your daily work.
Analytics Engineer
An Analytics Engineer focuses on transforming raw data into usable datasets for analysis. This course is directly relevant because it covers data engineering pipelines using Spark SQL and Spark Data Frame APIs. This course helps build skills in data transformation, data modeling, and pipeline development, which are core to analytics engineering. Learning Scala within the Spark ecosystem provides the practical knowledge needed for a successful career as an Analytics Engineer working with big data technologies.
Big Data Developer
A Big Data Developer is involved in developing and maintaining scalable big data solutions. This course provides the relevant skills for developing these solutions using Spark and Scala. You will gain hands-on experience in setting up and configuring a big data cluster, using Spark SQL and Data Frame APIs for data processing, and working with related technologies like Hadoop and Hive. The course's focus on practical exercises and real-world tasks makes it valuable for anyone wanting to develop big data applications. If your goal is to become a Big Data Developer, this course can help you gain the necessary hands-on skills.
ETL Developer
An Extract, Transform, Load or ETL Developer builds data pipelines to extract data from various sources, transforms it, and loads it into a data warehouse. This course may be useful because it focuses on building data engineering pipelines using Spark SQL and Spark Data Frame APIs with Scala. The course content on data processing and transformations directly applies to building efficient ETL processes. Additionally, the experience of setting up and managing Big Data clusters as taught in the course is also relevant to an ETL Developer. You will gain practical experience with industry-standard tools.
Data Architect
A Data Architect designs and manages the data infrastructure for an organization. This course provides a strong foundation in the technologies used to build scalable data pipelines. You will gain hands-on experience with Spark SQL, Data Frame APIs, and big data cluster setup, which are all essential for designing efficient and reliable data architectures. If you want to become a Data Architect, this course will help you develop a practical understanding of the technologies needed to build modern data infrastructure including data lakes and data warehouses.
Data Warehouse Architect
A Data Warehouse Architect designs and oversees the implementation of data warehousing solutions. This course can help build a solid foundation in the technologies used in modern data warehousing, particularly Apache Spark. Knowing how to use Spark SQL and Data Frame APIs is essential for anyone architecting data solutions at scale. Furthermore, the course's coverage of setting up and managing big data clusters provides practical insights into the infrastructure aspects of data warehousing. If you aspire to be a Data Warehouse Architect, this course will help you understand the practical considerations of building scalable and efficient data warehouses.
Machine Learning Engineer
A Machine Learning Engineer develops and deploys machine learning models. This course may be useful for understanding how to process and prepare data for machine learning at scale. The course's focus on Spark SQL and Data Frame APIs allows you to become proficient in data manipulation and transformation, which is an important step in the machine learning pipeline. You will also learn how to work with big data technologies that are commonly used in machine learning workflows. For those who wish to become a Machine Learning Engineer, this course will help develop skills in data engineering for machine learning.
Data Scientist
A Data Scientist analyzes large datasets, develops statistical models, and derives insights to inform business decisions. While Data Scientists often focus on the analytical aspects, understanding data engineering is becoming increasingly important. This course can provide a strong foundation in data processing using Spark SQL and Data Frame APIs. You will gain the skills to manipulate and transform data at scale, which is valuable for preparing data for analysis and modeling. This course is relevant for Data Scientists who want to expand their skillset into data engineering aspects.
Solutions Architect
A Solutions Architect designs and implements IT solutions to address business problems. This course can be beneficial in understanding how to design data-centric solutions using Apache Spark. The knowledge of Spark SQL, Data Frame APIs, and big data cluster setup will help you design scalable and efficient data processing solutions. This course provides the practical skills needed to make informed decisions about data architecture and technology choices. For aspiring Solutions Architects, this course helps develop a practical understanding of big data technologies.
Cloud Engineer
A Cloud Engineer builds and maintains cloud infrastructure and services. The course can prove useful because it involves setting up a big data cluster on AWS Cloud9 or GCP, which helps build practical experience with cloud environments. Understanding how to deploy and manage big data technologies in the cloud through Spark, Hadoop, and related tools are also relevant. For Cloud Engineers who want to specialize in big data deployments, this course offers a useful skillset.
Software Engineer
A Software Engineer designs, develops, and tests software applications. This course provides valuable experience in using Scala, a programming language often used in building scalable and high-performance applications. The course's coverage of Spark SQL and Data Frame APIs within the context of Scala can be valuable for Software Engineers working on data-intensive applications. If you are a Software Engineer looking to expand your skillset into big data processing, this course is relevant.
Database Administrator
A Database Administrator manages and maintains databases, ensuring their availability, performance, and security. This course can be relevant because it covers aspects of managing data within a big data environment using technologies like Hadoop and Hive. Setting up and configuring these systems, as taught in the course, provides valuable experience that can be applied to managing data in distributed systems. The course may be useful in expanding a Database Administrator's skill set into the realm of big data technologies.
Data Analyst
A Data Analyst interprets data and transforms it into insights that inform business decisions. While Data Analysts often use tools like SQL and Excel, understanding big data technologies can be increasingly valuable. This course may be useful by giving familiarity with Spark SQL and Data Frame APIs for data manipulation. You can learn how to process large datasets and extract meaningful information. This course is relevant for Data Analysts who wish to expand their skills in big data processing.
Application Developer
An Application Developer designs and codes applications. This course can be beneficial for Application Developers who want to work on data-intensive applications that require scalable data processing. Learning Spark SQL and Data Frame APIs with Scala will enable you to build applications that can efficiently handle large datasets. This course helps expand an Application Developer's skill set into the realm of big data and distributed computing.
Business Intelligence Analyst
A Business Intelligence Analyst analyzes data to identify trends and insights that help improve business performance. This course can be helpful in understanding how data is processed and transformed in a big data environment. The course's coverage of Spark SQL and data frame APIs allows you to learn how to efficiently query and analyze large datasets. For aspiring Business Intelligence Analysts, this course provides an understanding of data processing technologies used in modern business intelligence systems.

Reading list

We've selected two books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Spark SQL and Spark 3 using Scala Hands-On with Labs.
Provides a comprehensive overview of Apache Spark, covering Spark SQL and DataFrames in detail. It serves as an excellent reference for understanding the core concepts and APIs used in the course. The book offers practical examples and use cases that complement the hands-on labs. It is commonly used as a reference by data engineers and data scientists.
Offers a comprehensive guide to Apache Spark, covering a wide range of topics from basic concepts to advanced techniques. It provides in-depth explanations of Spark SQL, DataFrames, and other key components. This book is valuable as additional reading to expand on the course material. It is commonly used by industry professionals and academics.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser