We may earn an affiliate commission when you visit our partners.
Course image
Udemy logo

Data Engineering using Databricks on AWS and Azure

Durga Viswanatha Raju Gadiraju, Naga Bhuwaneshwar, and Kavitha Penmetsa

As part of this course, you will learn all the Data Engineering using cloud platform-agnostic technology called Databricks.

About Data Engineering

Read more

As part of this course, you will learn all the Data Engineering using cloud platform-agnostic technology called Databricks.

About Data Engineering

Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.

About Databricks

Databricks is the most popular cloud platform-agnostic data engineering tech stack. They are the committers of the Apache Spark project. Databricks run time provide Spark leveraging the elasticity of the cloud. With Databricks, you pay for what you use. Over a period of time, they came up with the idea of Lakehouse by providing all the features that are required for traditional BI as well as AI & ML. Here are some of the core features of Databricks.

  • Spark - Distributed Computing

  • Delta Lake - Perform CRUD Operations. It is primarily used to build capabilities such as inserting, updating, and deleting the data from files in Data Lake.

  • cloudFiles - Get the files in an incremental fashion in the most efficient way leveraging cloud features.

  • Databricks SQL - A Photon-based interface that is fine-tuned for running queries submitted for reporting and visualization by reporting tools. It is also used for Ad-hoc Analysis.

Course Details

As part of this course, you will be learning Data Engineering using Databricks.

  • Getting Started with Databricks

  • Setup Local Development Environment to develop Data Engineering Applications using Databricks

  • Using Databricks CLI to manage files, jobs, clusters, etc related to Data Engineering Applications

  • Spark Application Development Cycle to build Data Engineering Applications

  • Databricks Jobs and Clusters

  • Deploy and Run Data Engineering Jobs on Databricks Job Clusters as Python Application

  • Deploy and Run Data Engineering Jobs on Databricks Job Clusters using Notebooks

  • Deep Dive into Delta Lake using Dataframes on Databricks Platform

  • Deep Dive into Delta Lake using Spark SQL on Databricks Platform

  • Building Data Engineering Pipelines using Spark Structured Streaming on Databricks Clusters

  • Incremental File Processing using Spark Structured Streaming leveraging Databricks Auto Loader cloudFiles

  • Overview of AutoLoader cloudFiles File Discovery Modes - Directory Listing and File Notifications

  • Differences between Auto Loader cloudFiles File Discovery Modes - Directory Listing and File Notifications

  • Differences between traditional Spark Structured Streaming and leveraging Databricks Auto Loader cloudFiles for incremental file processing.

  • Overview of Databricks SQL for Data Analysis and reporting.

We will be adding a few more modules related to Pyspark, Spark with Scala, Spark SQL, and Streaming Pipelines in the coming weeks.

Desired Audience

Here is the desired audience for this advanced course.

  • Experienced application developers to gain expertise related to Data Engineering with prior knowledge and experience of Spark.

  • Experienced Data Engineers to gain enough skills to add Databricks to their profile.

  • Testers to improve their testing capabilities related to Data Engineering applications using Databricks.

Prerequisites

  • Logistics

    • Computer with decent configuration (At least

    Associated Costs

    As part of the training, you will only get the material. You need to practice on your own or corporate cloud account and Databricks Account.

    • You need to take care of the associated AWS or Azure costs.

    • You need to take care of the associated Databricks costs.

    Training Approach

    Here are the details related to the training approach.

    • It is self-paced with reference material, code snippets, and videos provided as part of Udemy.

    • One needs to sign up for their own Databricks environment to practice all the core features of Databricks.

    • We would recommend completing 2 modules every week by spending 4 to 5 hours per week.

    • It is highly recommended to take care of all the tasks so that one can get real experience of Databricks.

    • Support will be provided through Udemy Q&A.

    Here is the detailed course outline.

    Getting Started with Databricks on Azure

    As part of this section, we will go through the details about signing up to Azure and setup the Databricks cluster on Azure.

    • Getting Started with Databricks on Azure

    • Signup for the Azure Account

    • Login and Increase Quotas for regional vCPUs in Azure

    • Create Azure Databricks Workspace

    • Launching Azure Databricks Workspace or Cluster

    • Quick Walkthrough of Azure Databricks UI

    • Create Azure Databricks Single Node Cluster

    • Upload Data using Azure Databricks UI

    • Overview of Creating Notebook and Validating Files using Azure Databricks

    • Develop Spark Application using Azure Databricks Notebook

    • Validate Spark Jobs using Azure Databricks Notebook

    • Export and Import of Azure Databricks Notebooks

    • Terminating Azure Databricks Cluster and Deleting Configuration

    • Delete Azure Databricks Workspace by deleting Resource Group

    Azure Essentials for Databricks - Azure CLI

    As part of this section, we will go through the details about setting up Azure CLI to manage Azure resources using relevant commands.

    • Azure Essentials for Databricks - Azure CLI

    • Azure CLI using Azure Portal Cloud Shell

    • Getting Started with Azure CLI on Mac

    • Getting Started with Azure CLI on Windows

    • Warming up with Azure CLI - Overview

    • Create Resource Group using Azure CLI

    • Create ADLS Storage Account with in Resource Group

    • Add Container as part of Storage Account

    • Overview of Uploading the data into ADLS File System or Container

    • Setup Data Set locally to upload into ADLS File System or Container

    • Upload local directory into Azure ADLS File System or Container

    • Delete Azure ADLS Storage Account using Azure CLI

    • Delete Azure Resource Group using Azure CLI

    Mount ADLS on to Azure Databricks to access files from Azure Blob Storage

    As part of this section, we will go through the details related to mounting Azure Data Lake Storage (ADLS) on to Azure Databricks Clusters.

    • Mount ADLS on to Azure Databricks - Introduction

    • Ensure Azure Databricks Workspace

    • Setup Databricks CLI on Mac or Windows using Python Virtual Environment

    • Configure Databricks CLI for new Azure Databricks Workspace

    • Register an Azure Active Directory Application

    • Create Databricks Secret for AD Application Client Secret

    • Create ADLS Storage Account

    • Assign IAM Role on Storage Account to Azure AD Application

    • Setup Retail DB Dataset

    • Create ADLS Container or File System and Upload Data

    • Start Databricks Cluster to mount ADLS

    • Mount ADLS Storage Account on to Azure Databricks

    • Validate ADLS Mount Point on Azure Databricks Clusters

    • Unmount the mount point from Databricks

    • Delete Azure Resource Group used for Mounting ADLS on to Azure Databricks

    Setup Local Development Environment for Databricks

    As part of this section, we will go through the details related to setting up of local development environment for Databricks using tools such as Pycharm, Databricks dbconnect, Databricks dbutils, etc.

    • Setup Single Node Databricks Cluster

    • Install Databricks Connect

    • Configure Databricks Connect

    • Integrating Pycharm with Databricks Connect

    • Integrate Databricks Cluster with Glue Catalog

    • Setup AWS s3 Bucket and Grant Permissions

    • Mounting s3 Buckets into Databricks Clusters

    • Using Databricks dbutils from IDEs such as Pycharm

    Using Databricks CLI

    As part of this section, we will get an overview of Databricks CLI to interact with Databricks File System or DBFS.

    • Introduction to Databricks CLI

    • Install and Configure Databricks CLI

    • Interacting with Databricks File System using Databricks CLI

    • Getting Databricks Cluster Details using Databricks CLI

    Databricks Jobs and Clusters

    As part of this section, we will go through the details related to Databricks Jobs and Clusters.

    • Introduction to Databricks Jobs and Clusters

    • Creating Pools in Databricks Platform

    • Create Cluster on Azure Databricks

    • Request to Increase CPU Quota on Azure

    • Creating Job on Databricks

    • Submitting Jobs using Databricks Job Cluster

    • Create Pool in Databricks

    • Running Job using Interactive Databricks Cluster Attached to Pool

    • Running Job Using Databricks Job Cluster Attached to Pool

    • Exercise - Submit the application as a job using Databricks interactive cluster

    Deploy and Run Spark Applications on Databricks

    As part of this section, we will go through the details related to deploying Spark Applications on Databricks Clusters and also running those applications.

    • Prepare PyCharm for Databricks

    • Prepare Data Sets

    • Move files to ghactivity

    • Refactor Code for Databricks

    • Validating Data using Databricks

    • Setup Data Set for Production Deployment

    • Access File Metadata using Databricks dbutils

    • Build Deployable bundle for Databricks

    • Running Jobs using Databricks Web UI

    • Get Job and Run Details using Databricks CLI

    • Submitting Databricks Jobs using CLI

    • Setup and Validate Databricks Client Library

    • Resetting the Job using Databricks Jobs API

    • Run Databricks Job programmatically using Python

    • Detailed Validation of Data using Databricks Notebooks

    Deploy and Run Spark Jobs using Notebooks

    As part of this section, we will go through the details related to deploying Spark Applications on Databricks Clusters and also running those applications using Databricks Notebooks.

    • Modularizing Databricks Notebooks

    • Running Job using Databricks Notebook

    • Refactor application as Databricks Notebooks

    • Run Notebook using Databricks Development Cluster

    Deep Dive into Delta Lake using Spark Data Frames on Databricks

    As part of this section, we will go through all the important details related to Databricks Delta Lake using Spark Data Frames.

    • Introduction to Delta Lake using Spark Data Frames on Databricks

    • Creating Spark Data Frames for Delta Lake on Databricks

    • Writing Spark Data Frame using Delta Format on Databricks

    • Updating Existing Data using Delta Format on Databricks

    • Delete Existing Data using Delta Format on Databricks

    • Merge or Upsert Data using Delta Format on Databricks

    • Deleting using Merge in Delta Lake on Databricks

    • Point in Snapshot Recovery using Delta Logs on Databricks

    • Deleting unnecessary Delta Files using Vacuum on Databricks

    • Compaction of Delta Lake Files on Databricks

    Deep Dive into Delta Lake using Spark SQL on Databricks

    As part of this section, we will go through all the important details related to Databricks Delta Lake using Spark SQL.

    • Introduction to Delta Lake using Spark SQL on Databricks

    • Create Delta Lake Table using Spark SQL on Databricks

    • Insert Data to Delta Lake Table using Spark SQL on Databricks

    • Update Data in Delta Lake Table using Spark SQL on Databricks

    • Delete Data from Delta Lake Table using Spark SQL on Databricks

    • Merge or Upsert Data into Delta Lake Table using Spark SQL on Databricks

    • Using Merge Function over Delta Lake Table using Spark SQL on Databricks

    • Point in Snapshot Recovery using Delta Lake Table using Spark SQL on Databricks

    • Vacuuming Delta Lake Tables using Spark SQL on Databricks

    • Compaction of Delta Lake Tables using Spark SQL on Databricks

    Accessing Databricks Cluster Terminal via Web as well as SSH

    As part of this section, we will see how to access terminal related to Databricks Cluster via Web as well as SSH.

    • Enable Web Terminal in Databricks Admin Console

    • Launch Web Terminal for Databricks Cluster

    • Setup SSH for the Databricks Cluster Driver Node

    • Validate SSH Connectivity to the Databricks Driver Node on AWS

    • Limitations of SSH and comparison with Web Terminal related to Databricks Clusters

    Installing Softwares on Databricks Clusters using init scripts

    As part of this section, we will see how to bootstrap Databricks clusters by installing relevant 3rd party libraries for our applications.

    • Setup gen_logs on Databricks Cluster

    • Overview of Init Scripts for Databricks Clusters

    • Create Script to install software from git on Databricks Cluster

    • Copy init script to dbfs location

    • Create Databricks Standalone Cluster with init script

    Quick Recap of Spark Structured Streaming

    As part of this section, we will get a quick recap of Spark Structured streaming.

    • Validate Netcat on Databricks Driver Node

    • Push log messages to Netcat Webserver on Databricks Driver Node

    • Reading Web Server logs using Spark Structured Streaming

    • Writing Streaming Data to Files

    Incremental Loads using Spark Structured Streaming on Databricks

    As part of this section, we will understand how to perform incremental loads using Spark Structured Streaming on Databricks.

    • Overview of Spark Structured Streaming

    • Steps for Incremental Data Processing on Databricks

    • Configure Databricks Cluster with Instance Profile

    • Upload GHArchive Files to AWS s3 using Databricks Notebooks

    • Read JSON Data using Spark Structured Streaming on Databricks

    • Write using Delta file format using Trigger Once on Databricks

    • Analyze GHArchive Data in Delta files using Spark on Databricks

    • Add New GHActivity JSON files on Databricks

    • Load Data Incrementally to Target Table on Databricks

    • Validate Incremental Load on Databricks

    • Internals of Spark Structured Streaming File Processing on Databricks

    Incremental Loads using autoLoader Cloud Files on Databricks

    As part of this section we will see how to perform incremental loads using autoLoader cloudFiles on Databricks Clusters.

    • Overview of AutoLoader cloudFiles on Databricks

    • Upload GHArchive Files to s3 on Databricks

    • Write Data using AutoLoader cloudFiles on Databricks

    • Add New GHActivity JSON files on Databricks

    • Load Data Incrementally to Target Table on Databricks

    • Add New GHActivity JSON files on Databricks

    • Overview of Handling S3 Events using AWS Services on Databricks

    • Configure IAM Role for cloudFiles file notifications on Databricks

    • Incremental Load using cloudFiles File Notifications on Databricks

    • Review AWS Services for cloudFiles Event Notifications on Databricks

    • Review Metadata Generated for cloudFiles Checkpointing on Databricks

    Overview of Databricks SQL Clusters

    As part of this section, we will get an overview of Databricks SQL Clusters.

    • Overview of Databricks SQL Platform - Introduction

    • Run First Query using SQL Editor of Databricks SQL

    • Overview of Dashboards using Databricks SQL

    • Overview of Databricks SQL Data Explorer to review Metastore Databases and Tables

    • Use Databricks SQL Editor to develop scripts or queries

    • Review Metadata of Tables using Databricks SQL Platform

    • Overview of loading data into retail_db tables

    • Configure Databricks CLI to push data into the Databricks Platform

    • Copy JSON Data into DBFS using Databricks CLI

    • Analyze JSON Data using Spark APIs

    • Analyze Delta Table Schemas using Spark APIs

    • Load Data from Spark Data Frames into Delta Tables

    • Run Adhoc Queries using Databricks SQL Editor to validate data

    • Overview of External Tables using Databricks SQL

    • Using COPY Command to Copy Data into Delta Tables

    • Manage Databricks SQL Endpoints

Enroll now

What's inside

Learning objectives

  • Data engineering leveraging databricks features
  • Databricks cli to manage files, data engineering jobs and clusters for data engineering pipelines
  • Deploying data engineering applications developed using pyspark on job clusters
  • Deploying data engineering applications developed using pyspark using notebooks on job clusters
  • Perform crud operations leveraging delta lake using spark sql for data engineering applications or pipelines
  • Perform crud operations leveraging delta lake using pyspark for data engineering applications or pipelines
  • Setting up development environment to develop data engineering applications using databricks
  • Building data engineering pipelines using spark structured streaming on databricks clusters
  • Incremental file processing using spark structured streaming leveraging databricks auto loader cloudfiles
  • Overview of auto loader cloudfiles file discovery modes - directory listing and file notifications
  • Differences between auto loader cloudfiles file discovery modes - directory listing and file notifications
  • Differences between traditional spark structured streaming and leveraging databricks auto loader cloudfiles for incremental file processing.
  • Show more
  • Show less

Syllabus

Introduction to Data Engineering using Databricks
Overview of the course - Data Engineering using Databricks
Where are the resources that are used for this course?
Read more
Getting Started with Databricks on Azure
Getting Started with Databricks on Azure - Introduction
Signup for the Azure Account
Login and Increase Quotas for regional vCPUs in Azure
Create Azure Databricks Workspace
Launching Azure Databricks Workspace or Cluster
Quick Walkthrough of Azure Databricks UI
Create Azure Databricks Single Node Cluster
Upload Data using Azure Databricks UI
Overview of Creating Notebook and Validating Files
Develop Spark Application using Azure Databricks Notebook
Validate Spark Jobs using Azure Databricks Notebook
Export and Import of Azure Databricks Notebooks
Terminating Azure Databricks Cluster and Deleting Configuration
Delete Azure Databricks Workspace by deleting Resource Group
Azure Essentials for Databricks - Azure CLI
Azure CLI using Azure Portal Cloud Shell
Getting Started with Azure CLI on Mac
Getting Started with Azure CLI on Windows
Warming up with Azure CLI - Overview
Create Resource Group using Azure CLI
Create ADLS Storage Account with in Resource Group
Add Container as part of Storage Account
Overview of Uploading the data into ADLS File System or Container
Setup Data Set locally to upload into ADLS File System or Container
Upload local directory into Azure ADLS File System or Container
Delete Azure ADLS Storage Account using Azure CLI
Delete Azure Resource Group using Azure CLI
Mount ADLS on to Azure Databricks to access files from Azure Blob Storage
Mount ADLS on to Azure Databricks - Introduction
[Material] - Mount ADLS on to Azure Databricks
Ensure Azure Databricks Workspace
Setup Databricks CLI on Mac or Windows using Python Virtual Environment
Configure Databricks CLI for new Azure Databricks Workspace
Register an Azure Active Directory Application
Create Databricks Secret for AD Application Client Secret
Create ADLS Storage Account
Assign IAM Role on Storage Account to Azure AD Application
Setup Retail DB Dataset
Create ADLS Container or File System and Upload Data
Start Databricks Cluster to mount ADLS
Mount ADLS Storage Account on to Azure Databricks
Validate ADLS Mount Point on Azure Databricks Clusters
Unmount the mount point from Databricks
Delete Azure Resource Group used for Mounting ADLS on to Azure Databricks
Getting Started with Databricks on AWS
Introduction to Getting Started with Databricks on AWS
Signup for AWS Account
Login into AWS Management Console
Setup Databricks Workspace on AWS using QuickStart
Login into Databricks Workspace on AWS
Cleaning up the workspace
Quick Walkthrough of Databricks UI on AWS
Create Single Node Databricks Cluster on AWS
Upload Data using AWS Databricks UI
Overview of Creating Databricks Notebook on AWS and Validating Files
Develop Spark Application using AWS Databricks Notebook
Review the AWS Databricks Cluster state and restart
Write Data frame to DBFS and Validate using Databricks Notebook and Spark
Export and Import AWS Databricks Notebooks
AWS Essentials for Databricks - Setup Local Development Environment on Windows
Introduction to Setup Local Environment with AWS CLI and Boto3 on Windows
Overview of Powershell on Windows 10 or Windows 11
Setup Ubuntu VM on Windows 10 or 11 using wsl
Setup Python venv and pip on Ubuntu
Setup AWS CLI on Windows and Ubuntu using Pip
Create AWS IAM User and Download Credentials
Configure AWS CLI on Windows
Create Python Virtual Environment for AWS Projects
Setup Boto3 as part of Python Virtual Environment
Setup Jupyter Lab and Validate boto3
AWS Essentials for Databricks - Setup Local Development Environment on Mac
Introduction to Setup Local Development Enviroment for AWS on Mac
Setup AWS CLI on Mac
Setup AWS IAM User to configure AWS CLI
Configure AWS CLI using IAM User Credentials
Setup Python Virtual Environment on Mac using Python 3
AWS Essentials for Databricks - Overview of AWS Storage Solutions
Getting Started with AWS S3
[Instructions] Getting Started with AWS S3
Setup Data Set locally to upload to s3
[Instructions] Setup Data Set locally to upload to s3
Adding AWS S3 Buckets and Objects
[Instruction] Adding AWS s3 Buckets and Objects
Version Control in AWS S3
[Instructions] Version Control in AWS S3
AWS S3 Cross-Region Replication for fault tolerance
[Instructions] AWS S3 Cross-Region Replication for fault tolerance
Cross-Region Replication for Disaster Recovery of AWS S3
Overview of AWS S3 Storage Classes
[Instructions] Overview of AWS S3 Storage Classes or Storage Tiers
Overview of AWS S3 Glacier
[Instructions] Overview of Glacier in AWS s3

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Meets the needs of students and professionals who want to gain expertise in Data Engineering
Provides ample hands-on experience through Spark Structured Streaming, Delta Lake, and autoLoader cloudFiles
Empowers learners to develop and deploy Data Engineering applications using PySpark and Databricks Notebooks
Delivers a well-rounded understanding of Data Engineering principles and techniques using Databricks
Provides foundational knowledge for experienced application developers to transition into Data Engineering
Prerequisites include prior experience with Spark, which may limit accessibility for beginners

Save this course

Save Data Engineering using Databricks on AWS and Azure to your list so you can find it easily later:
Save

Activities

Coming soon We're preparing activities for Data Engineering using Databricks on AWS and Azure. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Data Engineering using Databricks on AWS and Azure will develop knowledge and skills that may be useful to these careers:
Data Engineer
A Data Engineer manages and constructs data pipelines. They build a strong foundation based on their understanding of data management, cloud computing, and software development. This course covers Delta Lake, Databricks SQL, and Databricks CLI. These tools and services are used by Data Engineers every day. It may also help someone in this role to learn about topics such as Apache Spark, Spark SQL, and Spark Structured Streaming.
Big Data Engineer
A Big Data Engineer designs and builds big data solutions. They need a skill set that covers topics like data management, cloud computing, and software development. This course can help build a solid foundation by teaching students about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Data Analyst
A Data Analyst collects, analyzes, and interprets data. They should have a strong foundation in data analysis, statistics, and cloud computing. This course can help build a strong foundation by teaching a Data Analyst about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Data Architect
A Data Architect designs and manages data architectures. They need a strong foundation in data management, cloud computing, and software development. This course can help build a solid foundation by teaching a Data Architect about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Solutions Architect
A Solutions Architect designs and implements technology solutions. They need a strong foundation in cloud computing, data management, and software development. This course can help by teaching a Solutions Architect about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Data Warehouse Architect
A Data Warehouse Architect designs and manages data warehouses. They need a strong foundation in data management, cloud computing, and software development. This course can help by teaching a Data Warehouse Architect about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Cloud Architect
A Cloud Architect designs and manages cloud computing solutions. They should have a firm grasp of topics like cloud computing, software development, and data management. This course can help by teaching a Cloud Architect about the Databricks platform, Apache Spark, and Spark Structured Streaming.
DevOps Engineer
A DevOps Engineer works to bridge the gap between development and operations. Some of the essential skills for this role include software development, cloud computing, and data management. This course can help by teaching a DevOps Engineer about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Machine Learning Engineer
A Machine Learning Engineer designs and builds machine learning models. They need a strong foundation in machine learning, cloud computing, and data management. This course can help build a solid foundation by teaching a Machine Learning Engineer about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Business Analyst
A Business Analyst analyzes business processes and makes recommendations for improvement. They should have a strong foundation in business analysis, data analysis, and cloud computing. This course can help build a strong foundation by teaching a Business Analyst about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Database Administrator
A Database Administrator manages and maintains databases. This course can help by teaching a Database Administrator about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Data Scientist
A Data Scientist extracts valuable insights from large amounts of data. Some of the knowledge that a Data Scientist needs in order to do this includes information about cloud computing, statistics, and machine learning. This course can help by teaching a Data Scientist about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Software Engineer
A Software Engineer designs, develops, and maintains software applications. They need a strong foundation in computer science and software development. This course can help by teaching a Software Engineer about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Project Manager
A Project Manager plans and manages projects. They need a strong foundation in project management, cloud computing, and data management. This course can help by teaching a Project Manager about the Databricks platform, Apache Spark, and Spark Structured Streaming.
Product Manager
A Product Manager manages the development and launch of products. They need a strong foundation in product management, cloud computing, and data management. This course can help by teaching a Product Manager about the Databricks platform, Apache Spark, and Spark Structured Streaming.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Engineering using Databricks on AWS and Azure.
Provides a comprehensive guide to data engineering using Apache Pig, covering topics such as data ingestion, transformation, and analytics.
Provides a comprehensive guide to data engineering using Apache HBase, covering topics such as data ingestion, transformation, and analytics.
Provides a comprehensive guide to data engineering using Apache Cassandra, covering topics such as data ingestion, transformation, and analytics.
This textbook is often found as a supplementary text for advanced computer science courses on big data or cloud computing at many institutions of higher learning. As it is the definitive guide on Apache Spark, it can be used as a supplemental text or as additional reading to gain a deeper understanding of the subject.
This textbook is relatively easier to read and good supplemental reference for beginners to advanced learners.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Data Engineering using Databricks on AWS and Azure.
Data Engineering with Databricks
Most relevant
Optimizing Apache Spark on Databricks
Most relevant
Getting Started with the Databricks Lakehouse Platform
Most relevant
Delta Lake with Azure Databricks: Deep Dive
Most relevant
Getting Started with Delta Lake on Databricks
Most relevant
Distributed Computing with Spark SQL
Most relevant
Building Your First ETL Pipeline Using Azure Databricks
Most relevant
Apache Spark 3 Fundamentals
Most relevant
Handling Streaming Data with Azure Databricks Using Spark...
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser