We may earn an affiliate commission when you visit our partners.
Course image
Microsoft

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

Read more

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run large data engineering workloads in the cloud.

You will discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. You will come to understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. You will also be introduced to the architecture of an Azure Databricks Spark Cluster and Spark Jobs. You will work with large amounts of data from multiple sources in different raw formats. you will learn how Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries.

This course is part of a Specialization intended for Data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services for anyone interested in preparing for the Exam DP-203: Data Engineering on Microsoft Azure (beta). You will take a practice exam that covers key skills measured by the certification exam.

This is the eighth course in a program of 10 courses to help prepare you to take the exam so that you can have expertise in designing and implementing data solutions that use Microsoft Azure data services. The Data Engineering on Microsoft Azure exam is an opportunity to prove knowledge expertise in integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions that use Microsoft Azure data services. Each course teaches you the concepts and skills that are measured by the exam.

By the end of this Specialization, you will be ready to take and sign-up for the Exam DP-203: Data Engineering on Microsoft Azure (beta).

Enroll now

What's inside

Syllabus

Introduction to Azure Databricks
Describe the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. Describe the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. Describe the architecture of an Azure Databricks Spark Cluster and Spark Jobs.
Read more
Read and write data in Azure Databricks
Describe how to use Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries.
Data processing in Azure Databricks
Process data in Azure Databricks by defining DataFrames to read and process the Data. Perform data transformations in DataFrames and execute actions to display the transformed data. Explain the difference between a transform and an action, lazy and eager evaluations, Wide and Narrow transformations, and other optimizations in Azure Databricks.
Work with DataFrames in Azure Databricks
Use the DataFrame Column Class Azure Databricks to apply column-level transformations, such as sorts, filters and aggregations. Use advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks.
Platform architecture, security, and data protection in Azure Databricks
Describe the Azure Databricks platform architecture and how it is securedUse Azure Key Vault to store secrets used by Azure Databricks and other services. Access Azure Storage with Key Vault-based secrets
Delta Lake
Describe how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations. Describe Azure Databricks Delta Lake architecture
Analyze streaming data and create production workloads
Process streaming data with Azure Databricks structured streaming. Create production workloads on Azure Databricks with Azure Data Factory.
Create a data architecture
Describe how to put Azure Databricks notebooks under version control in an Azure DevOps repo and build deployment pipelines to manage your release process. Describe how to integrate Azure Databricks with Azure Synapse Analytics as part of your data architecture. Describe best practices for workspace administration, security, tools, integration, databricks runtime, HA/DR, and clusters in Azure Databricks
Practice Exam on Data engineering with Azure Databricks
Prepare for the Microsoft Certified: Azure Data Engineer Associate exam

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Taught by Microsoft employees, who are recognized for their work in data engineering and big data
Designed for data engineers and developers who want to demonstrate their expertise in designing and implementing data solutions that use Microsoft Azure data services
Provides an opportunity to prepare for the Exam DP-203: Data Engineering on Microsoft Azure (beta) and obtain a recognized industry certification
Covers the skills and knowledge measured by the Microsoft Certified: Azure Data Engineer Associate exam
Provides hands-on experience with Azure Databricks, a leading cloud-based data engineering platform
Requires prior knowledge of data engineering principles and experience with data processing tools

Save this course

Save Microsoft Azure Databricks for Data Engineering to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Microsoft Azure Databricks for Data Engineering with these activities:
Review Apache Spark
Sharpen your existing Apache Spark skills to prepare for the course and accelerate your progress.
Browse courses on Apache Spark
Show steps
  • Revisit the Spark documentation.
  • Go through online tutorials and exercises.
Read 'Spark: The Definitive Guide'
Delve deeper into the concepts and best practices of Apache Spark by reading this comprehensive guide.
Show steps
  • Read chapters 1-3 to grasp the fundamentals of Spark.
  • Focus on chapters 4-6 to understand Spark's core APIs.
Solve Apache Spark Coding Challenges
Enhance your Apache Spark coding skills by solving practice problems and challenges.
Browse courses on Apache Spark
Show steps
  • Review solutions and learn from your mistakes.
  • Find coding challenges on platforms like HackerRank or LeetCode.
  • Attempt to solve the challenges using Apache Spark.
Three other activities
Expand to see all activities and additional details
Show all six activities
Follow Azure Databricks Tutorials
Gain hands-on experience with Azure Databricks through guided tutorials provided by Microsoft.
Browse courses on Azure Databricks
Show steps
  • Visit the Azure Databricks documentation.
  • Select a tutorial that aligns with your learning goals.
  • Follow the tutorial steps and complete the exercises.
Develop an Azure Databricks Notebook
Apply your learnings by creating a custom Azure Databricks notebook to solve a data engineering problem.
Browse courses on Azure Databricks
Show steps
  • Define the problem and gather the necessary data.
  • Create a new Azure Databricks notebook.
  • Write Spark code to process and analyze the data.
  • Visualize and interpret the results.
Coach a Junior Data Engineer
Sharpen your understanding of Azure Databricks and data engineering concepts by mentoring a junior data engineer.
Browse courses on Mentoring
Show steps
  • Identify a junior data engineer who can benefit from your guidance.
  • Establish regular meetings to provide support and answer questions.
  • Review their code and provide constructive feedback.

Career center

Learners who complete Microsoft Azure Databricks for Data Engineering will develop knowledge and skills that may be useful to these careers:
Data Integration Engineer
A Data Integration Engineer is responsible for designing, building, and maintaining data integration systems. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Integration Engineer who wants to use Azure Databricks in their work.
Data Engineer
A Data Engineer is responsible for designing, building, and maintaining data pipelines. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Engineer who wants to use Azure Databricks in their work.
Data Warehouse Engineer
A Data Warehouse Engineer is responsible for designing, building, and maintaining data warehouses. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Warehouse Engineer who wants to use Azure Databricks in their work.
Big Data Engineer
A Big Data Engineer is responsible for designing, building, and maintaining big data systems. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Big Data Engineer who wants to use Azure Databricks in their work.
Data Analyst
A Data Analyst is responsible for collecting, cleaning, and analyzing data to help businesses make informed decisions. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Analyst who wants to use Azure Databricks in their work.
Business Intelligence Analyst
A Business Intelligence Analyst is responsible for using data to help businesses make better decisions. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Business Intelligence Analyst who wants to use Azure Databricks in their work.
Data Scientist
A Data Scientist is responsible for using data to solve business problems. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Scientist who wants to use Azure Databricks in their work.
Machine Learning Engineer
A Machine Learning Engineer is responsible for designing, developing, and deploying machine learning models. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Machine Learning Engineer who wants to use Azure Databricks in their work.
Data Governance Analyst
A Data Governance Analyst is responsible for developing and implementing data governance policies and procedures. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Governance Analyst who wants to use Azure Databricks in their work.
Cloud Engineer
A Cloud Engineer is responsible for designing, building, and maintaining cloud-based systems. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Cloud Engineer who wants to use Azure Databricks in their work.
Database Administrator
A Database Administrator is responsible for managing and maintaining an organization's databases. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Database Administrator who wants to use Azure Databricks in their work.
Data Security Analyst
A Data Security Analyst is responsible for identifying and mitigating data security risks. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Security Analyst who wants to use Azure Databricks in their work.
Data Privacy Analyst
A Data Privacy Analyst is responsible for identifying and mitigating data privacy risks. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Privacy Analyst who wants to use Azure Databricks in their work.
Software Engineer
A Software Engineer is responsible for designing, developing, and maintaining software applications. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Software Engineer who wants to use Azure Databricks in their work.
Data Architect
A Data Architect is responsible for designing, developing, and maintaining the architecture of an organization's data systems. This course can help you build a foundation in Azure Databricks, a powerful platform for data engineering, which can be useful for a Data Architect who wants to use Azure Databricks in their work.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Microsoft Azure Databricks for Data Engineering.
Is considered a comprehensive guide to Apache Spark and its various components, making it suitable as a technical and up-to-date reference.
Is considered a foundational book for beginners interested in learning Apache Spark and its applications in big data analytics.
Directly from the creators of Spark, this book combination of theory and practice on how to create, tune, and run efficient Apache Spark applications.
Provides a strong foundation in distributed data processing concepts and MapReduce programming, which can serve as valuable background knowledge for understanding Azure Databricks' functionalities.
Offers a high-level overview of Machine Learning with Apache Spark. Best for those with an existing knowledge of Machine Learning.
While not directly focused on Azure Databricks, this classic guide provides a comprehensive overview of the Hadoop ecosystem, which is foundational for understanding many of the concepts used in Azure Databricks.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Microsoft Azure Databricks for Data Engineering.
Data Engineering with MS Azure Synapse Apache Spark Pools
Most relevant
Prep for Microsoft Azure Data Engineer Associate Cert DP...
Most relevant
Data Engineering using Databricks on AWS and Azure
Most relevant
Conceptualizing the Processing Model for Azure Databricks...
Most relevant
Perform data science with Azure Databricks
Most relevant
Building Your First ETL Pipeline Using Azure Databricks
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Handling Streaming Data with Azure Databricks Using Spark...
Most relevant
DP-203: Data Ingestion and Preparation
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser