We may earn an affiliate commission when you visit our partners.
Course image
Udemy logo

Essential SQL

Azure Data Factory and Data Engineering

Kris Wenzel

Use Azure data factory to automate data engineering tasks. This is a great course to introduce yourself to Azure Data Factory’s capabilities. We’ll look at the data factory from a data engineer perspective.

Read more

Use Azure data factory to automate data engineering tasks. This is a great course to introduce yourself to Azure Data Factory’s capabilities. We’ll look at the data factory from a data engineer perspective.

Through examples we’ll work side by side to create an Extract Transform and Load process to ingest movie rating data.

We are going to explore three different ways to transform your data.

The first is by using Azure Data Mapping flows. These are great no-code ways to do ETL.

We’ll then look at how you can do the same transformations using Python. This way, if you love Python, you know a solid way to use Azure data factory to work with your data.

Lastly, we’ll look at how you can create pipelines to use your knowledge of SQL and stored procedures.

What you’ll like about this course is that once you learn one way to transform the data, you can use that knowledge to learn about the other methods. So, if you’re a SQL expert but soft on python, you can learn the SQL way before trying another mapping method.  In the end you come out learning and appreciating alternative methods to ingesting, transforming, and storing data.

Enroll now

What's inside

Learning objectives

  • Learn how to construct elt solutions using azure data factory and sql server.
  • Implement various elt solutions using mapping data flows, copy activities, and stored procedures.
  • Develop the skills to build and troubleshoot azure data factory pipelines with confidence.
  • Enhance pipeline reliability by mastering the use of parameters to optimize your data workflows.
  • Learn efficient methods for storing and structuring data using azure storage services.
  • Understand the distinctive roles played by azure data factory, data storage, sql server, and data bricks in the creation of an elt solution.

Syllabus

Introduction

Welcome to Azure Data Factory.  Let's learn how transform data using Azure Data Engineer Tools such as Azure Data Mapping Flows, Copy Activities, and SQL.

Read more

If you are looking to become a data engineer or business analyst that need to work with data lakes then you'll find this course valuable.  Also, all of the items and concepts taught here apply to the MS-Fabric framework Microsoft is releasing.  If you company will be upgrading to Fabric in the near future, then start learning about the data factory now.

I'm taking a case study approach with this course.  I'll teach you two way to work with your data and transform it from raw data into a start schema you can use with PowerBI.  I'll show you how to do this within the medallion architecture.  Given this, there aren't any online exercises.

Instead, I encourage you to follow along work side-by-side me.

Ask questions in the comments, I'll answer them!

Work sections 2,3,4 in order.  This way you have Azure setup and have done the case studies.  I do refer back to the prior lessons, so when you are going through section 4, it's best that you have already done section 3.


You can treat section 5 as a reference.  I go over some data factory pipeline activity mechanics in this course.  It is mostly reference though, and they don't build up to any ETL work.  But, they do teach you how to use parameters, variables, copy, IF, SWITH, and FOR EACH; all good stuff!

In this section you'll set up the Microsoft Azure cloud environment we'll use to learn data engineering using data factory, storage accounts, and SQL server.

In this lesson I show you how to get your free Microsoft Azure Account and how to get that set up for use with the Data Engineer course.


Note the instruction frequently change, but these are the general steps:


Creating a free Azure account involves a few simple steps. Here's a general guide on how to do it:

  1. Visit the Azure website: Go to the official Azure website at https://azure.com.

  2. Click on "Free Account": Look for the "Free Account" option on the Azure website. This is typically prominently displayed on the homepage.

  3. Sign in or create a Microsoft account: You'll need a Microsoft account to sign in. If you don't have one, you can create a new account during this step. If you already have a Microsoft account, sign in with your credentials.

  4. Fill in the required information: Provide the necessary information to set up your Azure account. This may include your personal information, a phone number for verification, and a credit card. Note that a credit card is required, but you won't be charged during the free trial period unless you explicitly upgrade to a paid account.

  5. Verify your identity: Microsoft may require you to verify your identity for security purposes. This could involve receiving a code on your phone or email that you'll need to enter on the website.

  6. Set up your subscription: Choose the type of subscription you want. For the free account, you'll likely be offered a limited amount of Azure services for free for the first 12 months. After the trial period, you can choose to upgrade to a paid subscription or continue with the free services.

  7. Provide billing information: Even though the account is free for the first 12 months, you'll need to provide billing information. This is in case you decide to continue using Azure services beyond the free trial or if you exceed the limits of the free tier.

  8. Review and agree to the terms: Read through the terms and conditions, and if you agree, check the box to indicate your acceptance.

  9. Complete the sign-up process: Once you've provided all the necessary information, click on the "Sign up" or "Create account" button to complete the process.

After completing these steps, you should have access to your Azure portal, where you can start exploring and using Azure services within the limits of the free tier. Always be mindful of your usage to avoid unexpected charges, and regularly check your Azure portal for any updates or notifications.

Remember you can use the Cost Analysis feature under your resource group to monitor your spend.

Let's put together a resource group so our data engineering related resources are organized in one place.

In this lesson we'll set up a Azure Storage Account.  I walk through some important points.  A key one is that the naming is case sensitive.


Some students were having an issue finding their storage account.  There are some student questions attached to this lesson you can read.  I'm including the link here if you find you are also having troubles, it may help.


https://www.udemy.com/course/essential-sql-azure-data-factory-and-data-engineering/learn/#questions/21028574/


If it doesn't please reach out to me.  As you can imagine, every circumstance is different, and Microsoft does change things slightly over time.  It can be hard to capture all the edge cases!

The Microsoft Azure Data Factory, or ADF for short, is what we use to orchestrate our data engineering activities.  We'll use it to run task to load, copy, and transform data between areas within our data lake and SQL Server databases.

In order for the Data Factory to access the Storage Account it must be given permission.  We'll do so in this lesson and to a simple test copy to ensure it is working properly.

In this lesson let's set up a server-less Azure SQL database.

In this lesson we'll set up Azure Data Studio for use with SQL Server.  We'll use the tool to create queries and stored procedures.

Now that the SQL database is setup, let get the permission in place so that our data factory is able to write to it using a copy activity.

We are going to integrate our data factor with DevOps to make it easier for us to manage our changes as well as be able to save incomplete changes without compromising our production code.

In this section we'll ingest and and transform our data using Azure data mapping flows in conjunction with the Azure data factory.

In this lesson we learn about the medallion architecture:  Bronze, Silver, and Gold.  You'll also get acquainted with our case study and the star schema we're about to build.

Let's get the data lake folder structure set up and then load the Raw layer with data.  We'll use Storage Explorer to assist with the uploads.

In this lesson we start processing our data using the Azure Data Factory Data Mapping Flows to prepare our data for the Bronze layer.

Let's use ADF (Azure Data Factory) to execute the mapping flow we created in the previous lesson.

Let's go over the main steps we plan to accomplish when moving our data from the Bronze to Silver layer.

Learn how to use Mapping data flows to:

  • Change Data Types – String To Numeric, String to Date.

  • Filter out Empty Rows – OrderID blank.

  • Remove Duplicates – OrderID, Product is unique

  • Split Address – Make Street, City, State, Zip

In this part we'll focus on changing the data types.

Learn how to use Mapping data flows to:

  • Change Data Types – String To Numeric, String to Date.

  • Filter out Empty Rows – OrderID blank.

  • Remove Duplicates – OrderID, Product is unique

  • Split Address – Make Street, City, State, Zip

In this part we'll focus on filtering out empty rows and removing duplicates

Learn how to use Mapping data flows to:

  • Change Data Types – String To Numeric, String to Date.

  • Filter out Empty Rows – OrderID blank.

  • Remove Duplicates – OrderID, Product is unique

  • Split Address – Make Street, City, State, Zip

In this part we'll focus on the Address.

Learn how to use Mapping data flows to:

  • Change Data Types – String To Numeric, String to Date.

  • Filter out Empty Rows – OrderID blank.

  • Remove Duplicates – OrderID, Product is unique

  • Split Address – Make Street, City, State, Zip

In this part we'll look at why some of my data types got reset and then do a final check.

In this lesson we'll discuss what we need to do to create the gold layer's fact and dimension tables from the sliver layer data file.

In this lecture we'll continue working on creating the dimensions.  Once they are created, we'll update the pipeline to include them so we can build out the parquet files in the gold layer.

In this lesson we'll work on creating mapping flows to create the star schema dimensions and then back fill the fact with dimension ID's.

Gold Layer - Pipeline Automation and Fact Table Troubleshooting

In this lesson we'll refactor our ADF pipeline and then use Microsoft PowerBI to see the star schema in action.

In this section you'll learn how to construct a star schema within SQL Server.

Let's go over the steps to refine the data into the gold layer.  In this case study our gold layer exists in SQL Server.  The focus of these lessons are to show you how to use SQL with Azure Data Factory.

Use The ADF (Azure Data Factory) Copy Activity to move data from the RAW to Bronze layer.

The objective for this exercise are to land our data in SQL.  Overall in the Silver Layer we'll do the following:

Objectives:

  • Change Data Types – String To Numeric, String to Date.

  • Filter out Empty Rows – OrderID blank.

  • Remove Duplicates – OrderID, Product is unique

  • Split Address – Make Street, City, State, Zip


I'll show you how to do these steps using TSQL within a stored procedure.

In this lecture you learn to use the copy activity to load files from the DataLake into SQL Server.

In this lecture we will develop the SQL Query we will use to split (parse) the address into separate columns.

In this lecture we'll use the query from the previous lection to create a stored procedure to load our landed data into the Silver schema.  I'll show you the patter I routinely used to load the landed data and convert types.

In this lesson we'll modify our ADF pipeline to execute the stored procedure Stage.SaleOrderLoad once we have landed the Bronze data into the SQL Server database.

In the next series of lecture we'll learn how to transform the Silver layer SalesOrder table into a Fact table with linked dimensions as a star schema.

In this lesson we'll build the Product, PurchaseAddress, and FileSource dimensions.

In this lesson we'll use SQL to build the SalesOrderFact table using the dimensions built in the previous lesson.

In this lesson I'll show you how to move your completed tables from Stage to the Gold layer.  We'll then modify our ADF pipeline to ensure all the stored procedures are executed in the correct order.

In this lecture we'll use PowerBI to look at the star schema we created in SQL Server.

After going through this section you'll be more familiar with some of the concepts and main activities data engineers use within Azure Data Studio.

ADF stands for Azure Data Factory.  In this lesson we make sure we understand it purpose.

Learn about the difference between Azure Data Factory variables and parameters and when to use either in your pipeline.

Learn how to refactor (change for the better) our  Azure Data Factory pipeline to eliminate variables and utilize return values.

In this lesson we'll learn about the Azure Data Factory If Activity.

The ADF Switch activity is another conditional activity you can use in your pipeline.  Once thing that is handy about it, is that you can use the expression builder to output one or more switch values.  That can help you avoid nested If Statements, which are not allowed in ADF.

Azure Data Factory - For Each Activity

Let's keep building up our example to show how you can examine information returned from the GetMetaData activity.

In this lecture you learn to use the ADF Filter activity to filter out values from a JSON array.

Resources

In the resources section is the product database we'll use for the case studies.  Please download and unzip so you can use to populate the data lake.

Bonus Lecture

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Covers data engineering concepts from the perspective of a data engineer as opposed to a developer
Focuses on Azure Data Factory, which is an industry-standard data integration tool
Teaches Azure with practical examples, making it beginner-friendly
Suitable for data engineers, data analysts, and business analysts interested in working with data lakes
Demonstrates various ways to transform data, offering flexibility and adaptability
Leverages the medallion architecture, a widely recognized data modeling approach in the industry

Save this course

Save Essential SQL: Azure Data Factory and Data Engineering to your list so you can find it easily later:
Save

Activities

Coming soon We're preparing activities for Essential SQL: Azure Data Factory and Data Engineering. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Essential SQL: Azure Data Factory and Data Engineering will develop knowledge and skills that may be useful to these careers:
Data Engineer
A Data Engineer is responsible for designing, building, and maintaining data pipelines that collect, transform, and store data for use by data analysts and other stakeholders. This course can help you get started in this field by providing you with the skills you need to create and manage data pipelines using Azure Data Factory, Azure Data Mapping Flows, and SQL Server. You will also learn how to use Azure Storage services to store and structure your data.
Data Analyst
A Data Analyst is responsible for analyzing data to identify trends and patterns, and to develop insights that can help businesses make better decisions. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze data.
Data Scientist
A Data Scientist is responsible for developing and applying machine learning and artificial intelligence algorithms to data to solve business problems. This course can help you get started in this field by providing you with a foundation in data engineering and data analysis. You will also learn how to use Azure Machine Learning to develop and deploy machine learning models.
Machine Learning Engineer
A Machine Learning Engineer is responsible for designing, building, and maintaining machine learning models. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Azure Machine Learning to develop and deploy machine learning models.
Database Administrator
A Database Administrator is responsible for managing and maintaining databases. This course can help you get started in this field by providing you with the skills you need to work with SQL Server, including database design, database optimization, and database security. You will also learn how to use Azure Database services to manage and maintain your databases.
Business Analyst
A Business Analyst is responsible for analyzing business processes and identifying opportunities for improvement. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze data.
Software Engineer
A Software Engineer is responsible for designing, building, and maintaining software applications. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Python to develop and deploy software applications.
Marketing Manager
A Marketing Manager is responsible for developing and executing marketing campaigns. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and launch marketing campaigns. You will also learn how to use data to measure the success of your campaigns.
Healthcare Analyst
A Healthcare Analyst is responsible for analyzing data to identify trends and patterns in healthcare. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze healthcare data.
Project Manager
A Project Manager is responsible for planning, executing, and controlling projects. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute project plans. You will also learn how to use data to measure the success of your projects.
Sales Manager
A Sales Manager is responsible for leading and managing a sales team. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute sales strategies. You will also learn how to use data to measure the success of your sales team.
Financial Analyst
A Financial Analyst is responsible for analyzing financial data and making recommendations to businesses. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Excel to analyze financial data.
Product Manager
A Product Manager is responsible for managing the development and launch of new products. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and launch new products. You will also learn how to use data to measure the success of your products.
Operations Manager
An Operations Manager is responsible for managing the day-to-day operations of a business. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute operational plans. You will also learn how to use data to measure the success of your operations.
Supply Chain Manager
A Supply Chain Manager is responsible for managing the flow of goods and services between businesses. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute supply chain plans. You will also learn how to use data to measure the success of your supply chain.

Reading list

We've selected four books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Essential SQL: Azure Data Factory and Data Engineering.
Provides a comprehensive overview of SQL Server and its use in data engineering. It covers topics such as data ingestion, data transformation, and data warehousing.
Provides a comprehensive guide to data integration with SQL Server, including how to use SQL Server Integration Services to build data-driven solutions. It valuable resource for anyone who wants to learn more about data integration with SQL Server.
Provides a comprehensive overview of SQL Server Integration Services, including its architecture, components, and capabilities. It valuable resource for anyone who wants to learn more about SQL Server Integration Services and how to use it to build data-driven solutions.
Provides a comprehensive overview of data engineering with Scala. It covers topics such as data ingestion, data transformation, and data warehousing.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Essential SQL: Azure Data Factory and Data Engineering.
Create Mapping Data Flows in Azure Data Factory
Most relevant
Creating Mapping Data Flows on Azure Data Factory
Most relevant
DP-203 - Data Engineering on Microsoft Azure
Most relevant
DP-203 : Microsoft Certified Azure Data Engineer Associate
Most relevant
Mapping Data Flows en Azure Data Factory
Most relevant
Orchestrating Data Movement with Azure Data Factory
Most relevant
Azure Data Factory : Implement SCD Type 1
Most relevant
Building Your First Data Pipeline in Azure Data Factory
Most relevant
Monitoring and Troubleshooting Data Storage and Processing
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser