Essential SQL: Azure Data Factory and Data Engineering from Udemy

What's inside

Learning objectives

Learn how to construct elt solutions using azure data factory and sql server.
Implement various elt solutions using mapping data flows, copy activities, and stored procedures.
Develop the skills to build and troubleshoot azure data factory pipelines with confidence.

Enhance pipeline reliability by mastering the use of parameters to optimize your data workflows.
Learn efficient methods for storing and structuring data using azure storage services.
Understand the distinctive roles played by azure data factory, data storage, sql server, and data bricks in the creation of an elt solution.

Learn how to construct elt solutions using azure data factory and sql server.
Implement various elt solutions using mapping data flows, copy activities, and stored procedures.
Develop the skills to build and troubleshoot azure data factory pipelines with confidence.
Enhance pipeline reliability by mastering the use of parameters to optimize your data workflows.
Learn efficient methods for storing and structuring data using azure storage services.
Understand the distinctive roles played by azure data factory, data storage, sql server, and data bricks in the creation of an elt solution.

Syllabus

Introduction

Welcome to Azure Data Factory. Let's learn how transform data using Azure Data Engineer Tools such as Azure Data Mapping Flows, Copy Activities, and SQL.

If you are looking to become a data engineer or business analyst that need to work with data lakes then you'll find this course valuable. Also, all of the items and concepts taught here apply to the MS-Fabric framework Microsoft is releasing. If you company will be upgrading to Fabric in the near future, then start learning about the data factory now.

I'm taking a case study approach with this course. I'll teach you two way to work with your data and transform it from raw data into a start schema you can use with PowerBI. I'll show you how to do this within the medallion architecture. Given this, there aren't any online exercises.

Instead, I encourage you to follow along work side-by-side me.

Ask questions in the comments, I'll answer them!

Work sections 2,3,4 in order. This way you have Azure setup and have done the case studies. I do refer back to the prior lessons, so when you are going through section 4, it's best that you have already done section 3.

You can treat section 5 as a reference. I go over some data factory pipeline activity mechanics in this course. It is mostly reference though, and they don't build up to any ETL work. But, they do teach you how to use parameters, variables, copy, IF, SWITH, and FOR EACH; all good stuff!

In this section you'll set up the Microsoft Azure cloud environment we'll use to learn data engineering using data factory, storage accounts, and SQL server.

In this lesson I show you how to get your free Microsoft Azure Account and how to get that set up for use with the Data Engineer course.

Note the instruction frequently change, but these are the general steps:

Creating a free Azure account involves a few simple steps. Here's a general guide on how to do it:

Visit the Azure website: Go to the official Azure website at https://azure.com.
Click on "Free Account": Look for the "Free Account" option on the Azure website. This is typically prominently displayed on the homepage.
Sign in or create a Microsoft account: You'll need a Microsoft account to sign in. If you don't have one, you can create a new account during this step. If you already have a Microsoft account, sign in with your credentials.
Fill in the required information: Provide the necessary information to set up your Azure account. This may include your personal information, a phone number for verification, and a credit card. Note that a credit card is required, but you won't be charged during the free trial period unless you explicitly upgrade to a paid account.
Verify your identity: Microsoft may require you to verify your identity for security purposes. This could involve receiving a code on your phone or email that you'll need to enter on the website.
Set up your subscription: Choose the type of subscription you want. For the free account, you'll likely be offered a limited amount of Azure services for free for the first 12 months. After the trial period, you can choose to upgrade to a paid subscription or continue with the free services.
Provide billing information: Even though the account is free for the first 12 months, you'll need to provide billing information. This is in case you decide to continue using Azure services beyond the free trial or if you exceed the limits of the free tier.
Review and agree to the terms: Read through the terms and conditions, and if you agree, check the box to indicate your acceptance.
Complete the sign-up process: Once you've provided all the necessary information, click on the "Sign up" or "Create account" button to complete the process.

After completing these steps, you should have access to your Azure portal, where you can start exploring and using Azure services within the limits of the free tier. Always be mindful of your usage to avoid unexpected charges, and regularly check your Azure portal for any updates or notifications.

Remember you can use the Cost Analysis feature under your resource group to monitor your spend.

Let's put together a resource group so our data engineering related resources are organized in one place.

In this lesson we'll set up a Azure Storage Account. I walk through some important points. A key one is that the naming is case sensitive.

Some students were having an issue finding their storage account. There are some student questions attached to this lesson you can read. I'm including the link here if you find you are also having troubles, it may help.

https://www.udemy.com/course/essential-sql-azure-data-factory-and-data-engineering/learn/#questions/21028574/

If it doesn't please reach out to me. As you can imagine, every circumstance is different, and Microsoft does change things slightly over time. It can be hard to capture all the edge cases!

The Microsoft Azure Data Factory, or ADF for short, is what we use to orchestrate our data engineering activities. We'll use it to run task to load, copy, and transform data between areas within our data lake and SQL Server databases.

In order for the Data Factory to access the Storage Account it must be given permission. We'll do so in this lesson and to a simple test copy to ensure it is working properly.

In this lesson let's set up a server-less Azure SQL database.

In this lesson we'll set up Azure Data Studio for use with SQL Server. We'll use the tool to create queries and stored procedures.

Now that the SQL database is setup, let get the permission in place so that our data factory is able to write to it using a copy activity.

We are going to integrate our data factor with DevOps to make it easier for us to manage our changes as well as be able to save incomplete changes without compromising our production code.

In this section we'll ingest and and transform our data using Azure data mapping flows in conjunction with the Azure data factory.

In this lesson we learn about the medallion architecture: Bronze, Silver, and Gold. You'll also get acquainted with our case study and the star schema we're about to build.

Let's get the data lake folder structure set up and then load the Raw layer with data. We'll use Storage Explorer to assist with the uploads.

In this lesson we start processing our data using the Azure Data Factory Data Mapping Flows to prepare our data for the Bronze layer.

Let's use ADF (Azure Data Factory) to execute the mapping flow we created in the previous lesson.

Let's go over the main steps we plan to accomplish when moving our data from the Bronze to Silver layer.

Learn how to use Mapping data flows to:

Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip

In this part we'll focus on changing the data types.

Learn how to use Mapping data flows to:

Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip

In this part we'll focus on filtering out empty rows and removing duplicates

Learn how to use Mapping data flows to:

Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip

In this part we'll focus on the Address.

Learn how to use Mapping data flows to:

Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip

In this part we'll look at why some of my data types got reset and then do a final check.

In this lesson we'll discuss what we need to do to create the gold layer's fact and dimension tables from the sliver layer data file.

In this lecture we'll continue working on creating the dimensions. Once they are created, we'll update the pipeline to include them so we can build out the parquet files in the gold layer.

In this lesson we'll work on creating mapping flows to create the star schema dimensions and then back fill the fact with dimension ID's.

Gold Layer - Pipeline Automation and Fact Table Troubleshooting

In this lesson we'll refactor our ADF pipeline and then use Microsoft PowerBI to see the star schema in action.

In this section you'll learn how to construct a star schema within SQL Server.

Let's go over the steps to refine the data into the gold layer. In this case study our gold layer exists in SQL Server. The focus of these lessons are to show you how to use SQL with Azure Data Factory.

Use The ADF (Azure Data Factory) Copy Activity to move data from the RAW to Bronze layer.

The objective for this exercise are to land our data in SQL. Overall in the Silver Layer we'll do the following:

Objectives:

Change Data Types – String To Numeric, String to Date.
Filter out Empty Rows – OrderID blank.
Remove Duplicates – OrderID, Product is unique
Split Address – Make Street, City, State, Zip

I'll show you how to do these steps using TSQL within a stored procedure.

In this lecture you learn to use the copy activity to load files from the DataLake into SQL Server.

In this lecture we will develop the SQL Query we will use to split (parse) the address into separate columns.

In this lecture we'll use the query from the previous lection to create a stored procedure to load our landed data into the Silver schema. I'll show you the patter I routinely used to load the landed data and convert types.

In this lesson we'll modify our ADF pipeline to execute the stored procedure Stage.SaleOrderLoad once we have landed the Bronze data into the SQL Server database.

In the next series of lecture we'll learn how to transform the Silver layer SalesOrder table into a Fact table with linked dimensions as a star schema.

In this lesson we'll build the Product, PurchaseAddress, and FileSource dimensions.

In this lesson we'll use SQL to build the SalesOrderFact table using the dimensions built in the previous lesson.

In this lesson I'll show you how to move your completed tables from Stage to the Gold layer. We'll then modify our ADF pipeline to ensure all the stored procedures are executed in the correct order.

In this lecture we'll use PowerBI to look at the star schema we created in SQL Server.

After going through this section you'll be more familiar with some of the concepts and main activities data engineers use within Azure Data Studio.

ADF stands for Azure Data Factory. In this lesson we make sure we understand it purpose.

Learn about the difference between Azure Data Factory variables and parameters and when to use either in your pipeline.

Learn how to refactor (change for the better) our Azure Data Factory pipeline to eliminate variables and utilize return values.

In this lesson we'll learn about the Azure Data Factory If Activity.

The ADF Switch activity is another conditional activity you can use in your pipeline. Once thing that is handy about it, is that you can use the expression builder to output one or more switch values. That can help you avoid nested If Statements, which are not allowed in ADF.

Azure Data Factory - For Each Activity

Let's keep building up our example to show how you can examine information returned from the GetMetaData activity.

In this lecture you learn to use the ADF Filter activity to filter out values from a JSON array.

Resources

In the resources section is the product database we'll use for the case studies. Please download and unzip so you can use to populate the data lake.

Bonus Lecture

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Covers data engineering concepts from the perspective of a data engineer as opposed to a developer

Focuses on Azure Data Factory, which is an industry-standard data integration tool

Teaches Azure with practical examples, making it beginner-friendly

Suitable for data engineers, data analysts, and business analysts interested in working with data lakes

Demonstrates various ways to transform data, offering flexibility and adaptability

Leverages the medallion architecture, a widely recognized data modeling approach in the industry

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Essential SQL: Azure Data Factory and Data Engineering with these activities:

Compile notes, assignments, quizzes, and exams

Show steps

Compiling and reviewing course materials will help you refresh your knowledge and identify areas where you need extra support.

Show steps

Gather all course materials, including notes, assignments, quizzes, and exams.
Organize materials by topic or week.
Review materials regularly to keep information fresh in your mind.

Follow tutorials on Azure Data Factory concepts and techniques

Show steps

Guided tutorials provide step-by-step instructions that will help you master the concepts and techniques covered in this course.

Browse courses on Azure Data Factory

Show steps

Identify specific concepts or techniques that you need to improve on.
Search for online tutorials or documentation that covers those topics.
Follow the instructions and complete the exercises in the tutorials.
Test your understanding by completing additional practice exercises or projects.

Practice using Azure Data Factory to create and manage data pipelines

Show steps

Hands-on practice is essential for mastering data engineering skills. Practice drills will help you develop proficiency in using Azure Data Factory.

Browse courses on Azure Data Factory

Show steps

Set up a development environment with Azure Data Factory.
Create a data pipeline that ingests data from a source, transforms it, and loads it into a destination.
Monitor the pipeline and make adjustments as needed.
Troubleshoot any errors that occur during pipeline execution.

Two other activities

Expand to see all activities and additional details

Show all five activities

Build a data engineering project using Azure Data Factory

Show steps

Creating a real-world project will allow you to apply the skills you've learned in this course and demonstrate your proficiency in data engineering.

Browse courses on Azure Data Factory

Show steps

Define the scope and objectives of your project.
Design the data pipeline using Azure Data Factory.
Implement the data pipeline and test its functionality.
Document your project and share it with others.

Mentor other students in data engineering

Show steps

Mentoring others can help you reinforce your understanding of data engineering concepts and develop your communication and teaching skills.

Show steps

Identify opportunities to mentor other students, such as through online forums or study groups.
Share your knowledge and experience with other students.
Provide feedback and support to help other students learn and grow.

Career center

Learners who complete Essential SQL: Azure Data Factory and Data Engineering will develop knowledge and skills that may be useful to these careers:

Data Engineer

A Data Engineer is responsible for designing, building, and maintaining data pipelines that collect, transform, and store data for use by data analysts and other stakeholders. This course can help you get started in this field by providing you with the skills you need to create and manage data pipelines using Azure Data Factory, Azure Data Mapping Flows, and SQL Server. You will also learn how to use Azure Storage services to store and structure your data.

See salaries and explore the career path for Data Engineer

Data Analyst

A Data Analyst is responsible for analyzing data to identify trends and patterns, and to develop insights that can help businesses make better decisions. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze data.

See salaries and explore the career path for Data Analyst

Machine Learning Engineer

A Machine Learning Engineer is responsible for designing, building, and maintaining machine learning models. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Azure Machine Learning to develop and deploy machine learning models.

See salaries and explore the career path for Machine Learning Engineer

Database Administrator

A Database Administrator is responsible for managing and maintaining databases. This course can help you get started in this field by providing you with the skills you need to work with SQL Server, including database design, database optimization, and database security. You will also learn how to use Azure Database services to manage and maintain your databases.

See salaries and explore the career path for Database Administrator

Data Scientist

A Data Scientist is responsible for developing and applying machine learning and artificial intelligence algorithms to data to solve business problems. This course can help you get started in this field by providing you with a foundation in data engineering and data analysis. You will also learn how to use Azure Machine Learning to develop and deploy machine learning models.

See salaries and explore the career path for Data Scientist

Business Analyst

A Business Analyst is responsible for analyzing business processes and identifying opportunities for improvement. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze data.

See salaries and explore the career path for Business Analyst

Software Engineer

A Software Engineer is responsible for designing, building, and maintaining software applications. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Python to develop and deploy software applications.

See salaries and explore the career path for Software Engineer

Marketing Manager

A Marketing Manager is responsible for developing and executing marketing campaigns. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and launch marketing campaigns. You will also learn how to use data to measure the success of your campaigns.

See salaries and explore the career path for Marketing Manager

Product Manager

A Product Manager is responsible for managing the development and launch of new products. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and launch new products. You will also learn how to use data to measure the success of your products.

See salaries and explore the career path for Product Manager

Sales Manager

A Sales Manager is responsible for leading and managing a sales team. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute sales strategies. You will also learn how to use data to measure the success of your sales team.

See salaries and explore the career path for Sales Manager

Financial Analyst

A Financial Analyst is responsible for analyzing financial data and making recommendations to businesses. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use Excel to analyze financial data.

See salaries and explore the career path for Financial Analyst

Operations Manager

An Operations Manager is responsible for managing the day-to-day operations of a business. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute operational plans. You will also learn how to use data to measure the success of your operations.

See salaries and explore the career path for Operations Manager

Project Manager

A Project Manager is responsible for planning, executing, and controlling projects. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute project plans. You will also learn how to use data to measure the success of your projects.

See salaries and explore the career path for Project Manager

Supply Chain Manager

A Supply Chain Manager is responsible for managing the flow of goods and services between businesses. This course can help you get started in this field by providing you with the skills you need to understand the market, identify customer needs, and develop and execute supply chain plans. You will also learn how to use data to measure the success of your supply chain.

See salaries and explore the career path for Supply Chain Manager

Healthcare Analyst

A Healthcare Analyst is responsible for analyzing data to identify trends and patterns in healthcare. This course can help you get started in this field by providing you with the skills you need to work with data, including data wrangling, data visualization, and statistical analysis. You will also learn how to use SQL Server to query and analyze healthcare data.

See salaries and explore the career path for Healthcare Analyst

Reading list

We've selected four books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Essential SQL: Azure Data Factory and Data Engineering.

SQL Server MVP Deep Dives, Volume 2

Save

Provides a comprehensive overview of SQL Server and its use in data engineering. It covers topics such as data ingestion, data transformation, and data warehousing.

SQL Server MVP Deep Dives, Volume 2

Kindle Edition

Expert Cube Development with SSAS Multidimensional...

Save

Provides a comprehensive guide to data integration with SQL Server, including how to use SQL Server Integration Services to build data-driven solutions. It valuable resource for anyone who wants to learn more about data integration with SQL Server.

Expert Cube Development with SSAS Multidimensional...

Kindle Edition

Microsoft SQL Server 2012 Analysis Services

Save

Provides a comprehensive overview of SQL Server Integration Services, including its architecture, components, and capabilities. It valuable resource for anyone who wants to learn more about SQL Server Integration Services and how to use it to build data-driven solutions.

Microsoft SQL Server 2012 Analysis Services: The...

Kindle Edition