Use Azure data factory to automate data engineering tasks. This is a great course to introduce yourself to Azure Data Factory’s capabilities. We’ll look at the data factory from a data engineer perspective.
Use Azure data factory to automate data engineering tasks. This is a great course to introduce yourself to Azure Data Factory’s capabilities. We’ll look at the data factory from a data engineer perspective.
Through examples we’ll work side by side to create an Extract Transform and Load process to ingest movie rating data.
We are going to explore three different ways to transform your data.
The first is by using Azure Data Mapping flows. These are great no-code ways to do ETL.
We’ll then look at how you can do the same transformations using Python. This way, if you love Python, you know a solid way to use Azure data factory to work with your data.
Lastly, we’ll look at how you can create pipelines to use your knowledge of SQL and stored procedures.
What you’ll like about this course is that once you learn one way to transform the data, you can use that knowledge to learn about the other methods. So, if you’re a SQL expert but soft on python, you can learn the SQL way before trying another mapping method. In the end you come out learning and appreciating alternative methods to ingesting, transforming, and storing data.
Welcome to Azure Data Factory. Let's learn how transform data using Azure Data Engineer Tools such as Azure Data Mapping Flows, Copy Activities, and SQL.
If you are looking to become a data engineer or business analyst that need to work with data lakes then you'll find this course valuable. Also, all of the items and concepts taught here apply to the MS-Fabric framework Microsoft is releasing. If you company will be upgrading to Fabric in the near future, then start learning about the data factory now.
I'm taking a case study approach with this course. I'll teach you two way to work with your data and transform it from raw data into a start schema you can use with PowerBI. I'll show you how to do this within the medallion architecture. Given this, there aren't any online exercises.
Instead, I encourage you to follow along work side-by-side me.
Ask questions in the comments, I'll answer them!
Work sections 2,3,4 in order. This way you have Azure setup and have done the case studies. I do refer back to the prior lessons, so when you are going through section 4, it's best that you have already done section 3.
You can treat section 5 as a reference. I go over some data factory pipeline activity mechanics in this course. It is mostly reference though, and they don't build up to any ETL work. But, they do teach you how to use parameters, variables, copy, IF, SWITH, and FOR EACH; all good stuff!
In this lesson I show you how to get your free Microsoft Azure Account and how to get that set up for use with the Data Engineer course.
Note the instruction frequently change, but these are the general steps:
Creating a free Azure account involves a few simple steps. Here's a general guide on how to do it:
Visit the Azure website: Go to the official Azure website at https://azure.com.
Click on "Free Account": Look for the "Free Account" option on the Azure website. This is typically prominently displayed on the homepage.
Sign in or create a Microsoft account: You'll need a Microsoft account to sign in. If you don't have one, you can create a new account during this step. If you already have a Microsoft account, sign in with your credentials.
Fill in the required information: Provide the necessary information to set up your Azure account. This may include your personal information, a phone number for verification, and a credit card. Note that a credit card is required, but you won't be charged during the free trial period unless you explicitly upgrade to a paid account.
Verify your identity: Microsoft may require you to verify your identity for security purposes. This could involve receiving a code on your phone or email that you'll need to enter on the website.
Set up your subscription: Choose the type of subscription you want. For the free account, you'll likely be offered a limited amount of Azure services for free for the first 12 months. After the trial period, you can choose to upgrade to a paid subscription or continue with the free services.
Provide billing information: Even though the account is free for the first 12 months, you'll need to provide billing information. This is in case you decide to continue using Azure services beyond the free trial or if you exceed the limits of the free tier.
Review and agree to the terms: Read through the terms and conditions, and if you agree, check the box to indicate your acceptance.
Complete the sign-up process: Once you've provided all the necessary information, click on the "Sign up" or "Create account" button to complete the process.
After completing these steps, you should have access to your Azure portal, where you can start exploring and using Azure services within the limits of the free tier. Always be mindful of your usage to avoid unexpected charges, and regularly check your Azure portal for any updates or notifications.
Remember you can use the Cost Analysis feature under your resource group to monitor your spend.
Let's put together a resource group so our data engineering related resources are organized in one place.
In this lesson we'll set up a Azure Storage Account. I walk through some important points. A key one is that the naming is case sensitive.
Some students were having an issue finding their storage account. There are some student questions attached to this lesson you can read. I'm including the link here if you find you are also having troubles, it may help.
https://www.udemy.com/course/essential-sql-azure-data-factory-and-data-engineering/learn/#questions/21028574/
If it doesn't please reach out to me. As you can imagine, every circumstance is different, and Microsoft does change things slightly over time. It can be hard to capture all the edge cases!
The Microsoft Azure Data Factory, or ADF for short, is what we use to orchestrate our data engineering activities. We'll use it to run task to load, copy, and transform data between areas within our data lake and SQL Server databases.
In order for the Data Factory to access the Storage Account it must be given permission. We'll do so in this lesson and to a simple test copy to ensure it is working properly.
In this lesson let's set up a server-less Azure SQL database.
In this lesson we'll set up Azure Data Studio for use with SQL Server. We'll use the tool to create queries and stored procedures.
Now that the SQL database is setup, let get the permission in place so that our data factory is able to write to it using a copy activity.
We are going to integrate our data factor with DevOps to make it easier for us to manage our changes as well as be able to save incomplete changes without compromising our production code.
In this lesson we learn about the medallion architecture: Bronze, Silver, and Gold. You'll also get acquainted with our case study and the star schema we're about to build.
Let's get the data lake folder structure set up and then load the Raw layer with data. We'll use Storage Explorer to assist with the uploads.
In this lesson we start processing our data using the Azure Data Factory Data Mapping Flows to prepare our data for the Bronze layer.
Let's use ADF (Azure Data Factory) to execute the mapping flow we created in the previous lesson.
Let's go over the main steps we plan to accomplish when moving our data from the Bronze to Silver layer.
Learn how to use Mapping data flows to:
Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip
In this part we'll focus on changing the data types.
Learn how to use Mapping data flows to:
Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip
In this part we'll focus on filtering out empty rows and removing duplicates
Learn how to use Mapping data flows to:
Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip
In this part we'll focus on the Address.
Learn how to use Mapping data flows to:
Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip
In this part we'll look at why some of my data types got reset and then do a final check.
In this lesson we'll discuss what we need to do to create the gold layer's fact and dimension tables from the sliver layer data file.
In this lecture we'll continue working on creating the dimensions. Once they are created, we'll update the pipeline to include them so we can build out the parquet files in the gold layer.
In this lesson we'll work on creating mapping flows to create the star schema dimensions and then back fill the fact with dimension ID's.
In this lesson we'll refactor our ADF pipeline and then use Microsoft PowerBI to see the star schema in action.
Let's go over the steps to refine the data into the gold layer. In this case study our gold layer exists in SQL Server. The focus of these lessons are to show you how to use SQL with Azure Data Factory.
Use The ADF (Azure Data Factory) Copy Activity to move data from the RAW to Bronze layer.
The objective for this exercise are to land our data in SQL. Overall in the Silver Layer we'll do the following:
Objectives:
Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip
I'll show you how to do these steps using TSQL within a stored procedure.
In this lecture you learn to use the copy activity to load files from the DataLake into SQL Server.
In this lecture we will develop the SQL Query we will use to split (parse) the address into separate columns.
In this lecture we'll use the query from the previous lection to create a stored procedure to load our landed data into the Silver schema. I'll show you the patter I routinely used to load the landed data and convert types.
In this lesson we'll modify our ADF pipeline to execute the stored procedure Stage.SaleOrderLoad once we have landed the Bronze data into the SQL Server database.
In the next series of lecture we'll learn how to transform the Silver layer SalesOrder table into a Fact table with linked dimensions as a star schema.
In this lesson we'll build the Product, PurchaseAddress, and FileSource dimensions.
In this lesson we'll use SQL to build the SalesOrderFact table using the dimensions built in the previous lesson.
In this lesson I'll show you how to move your completed tables from Stage to the Gold layer. We'll then modify our ADF pipeline to ensure all the stored procedures are executed in the correct order.
In this lecture we'll use PowerBI to look at the star schema we created in SQL Server.
ADF stands for Azure Data Factory. In this lesson we make sure we understand it purpose.
Learn about the difference between Azure Data Factory variables and parameters and when to use either in your pipeline.
Learn how to refactor (change for the better) our Azure Data Factory pipeline to eliminate variables and utilize return values.
In this lesson we'll learn about the Azure Data Factory If Activity.
The ADF Switch activity is another conditional activity you can use in your pipeline. Once thing that is handy about it, is that you can use the expression builder to output one or more switch values. That can help you avoid nested If Statements, which are not allowed in ADF.
Let's keep building up our example to show how you can examine information returned from the GetMetaData activity.
In this lecture you learn to use the ADF Filter activity to filter out values from a JSON array.
In the resources section is the product database we'll use for the case studies. Please download and unzip so you can use to populate the data lake.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.