We may earn an affiliate commission when you visit our partners.
Course image
Microsoft

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run data science workloads in the cloud.

Read more

In this course, you will learn how to harness the power of Apache Spark and powerful clusters running on the Azure Databricks platform to run data science workloads in the cloud.

This is the fourth course in a five-course program that prepares you to take the DP-100: Designing and Implementing a Data Science Solution on Azurec ertification exam.

The certification exam is an opportunity to prove knowledge and expertise operate machine learning solutions at a cloud-scale using Azure Machine Learning. This specialization teaches you to leverage your existing knowledge of Python and machine learning to manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring in Microsoft Azure. Each course teaches you the concepts and skills that are measured by the exam.

This Specialization is intended for data scientists with existing knowledge of Python and machine learning frameworks like Scikit-Learn, PyTorch, and Tensorflow, who want to build and operate machine learning solutions in the cloud. It teaches data scientists how to create end-to-end solutions in Microsoft Azure. Students will learn how to manage Azure resources for machine learning; run experiments and train models; deploy and operationalize machine learning solutions, and implement responsible machine learning. They will also learn to use Azure Databricks to explore, prepare, and model data; and integrate Databricks machine learning processes with Azure Machine Learning.

Enroll now

What's inside

Syllabus

Introduction to Azure Databricks
In this module, you will discover the capabilities of Azure Databricks and the Apache Spark notebook for processing huge files. You will come to understand the Azure Databricks platform and identify the types of tasks well-suited for Apache Spark. You will also be introduced to the architecture of an Azure Databricks Spark Cluster and Spark Jobs.
Read more
Working with data in Azure Databricks
Azure Databricks supports day-to-day data-handling functions, such as reads, writes, and queries. In this module, you will work with large amounts of data from multiple sources in different raw formats. You will also learn to use the DataFrame Column Class Azure Databricks to apply column-level transformations, such as sorts, filters and aggregations. You will also use advanced DataFrame functions operations to manipulate data, apply aggregates, and perform date and time operations in Azure Databricks.
Processing data in Azure Databricks
Azure Databricks supports a range of built in SQL functions, however, sometimes you have to write custom function, known as User-Defined Function (UDF). In this module, you will learn how to register and invoke UDFs. You will also learn how to use Delta Lake to create, append, and upsert data to Apache Spark tables, taking advantage of built-in reliability and optimizations.
Get started with Databricks and machine learning
In this module, you will learn how to use PySpark’s machine learning package to build key components of the machine learning workflows that include exploratory data analysis, model training, and model evaluation. You will also learn how to build pipelines for common data featurization tasks.
Manage machine learning lifecycles and fine tune models
In this module, you will learn how to use MLflow to track machine learning experiments and how to use modules from the Spark’s machine learning library for hyperparameter tuning and model selection.
Train a distributed neural network and serve models with Azure Machine Learning
In this module, you will learn how to use the Uber’s Horovod framework along with the Petastorm library to run distributed, deep learning training jobs on Spark using training datasets in the Apache Parquet format. You will also learn how to use MLflow and Azure Machine Learning service register, package, and deploy a trained model to both Azure Container Instance, and Azure Kubernetes Service as a scoring web service.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Builds a strong foundation for beginners to machine learning in the cloud
Develops professional skills or deep expertise in machine learning solutions operating at a cloud scale
Explores Azure Machine Learning, which is a standard solution for operating machine solutions at a cloud scale
Taught by Microsoft instructors, who are recognized for their work in machine learning
Could be difficult for learners who do not have prior experience with Python and machine learning

Save this course

Save Perform data science with Azure Databricks to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Perform data science with Azure Databricks with these activities:
Attend a Local Meetup or Conference on Data Science and Azure
Connect with professionals in the field and gain insights into real-world applications of Azure Databricks, expanding your network and knowledge.
Browse courses on Data Science
Show steps
  • Research local meetups or conferences related to data science and Azure.
  • Attend the event and engage with other attendees.
  • Follow up with interesting contacts.
Practice Writing DataFrames Transformations
Sharpen your skills in transforming DataFrames by practicing various operations to improve your understanding of data manipulation.
Browse courses on Apache Spark
Show steps
  • Create a DataFrame.
  • Apply different transformations, such as sort, filter, and aggregate.
  • Explore the results of the transformations.
Develop a Data Preprocessing Pipeline for a Specific Dataset
Apply your knowledge of data preprocessing techniques to create a pipeline for a specific dataset, deepening your understanding of real-world applications.
Browse courses on Data Preprocessing
Show steps
  • Choose a dataset.
  • Identify the data preprocessing steps required.
  • Implement the data preprocessing pipeline using Azure Databricks.
  • Evaluate the effectiveness of the pipeline.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Build a Sample Azure Databricks Machine Learning Pipeline
Build an Azure Databricks Machine Learning pipeline using the concepts learned in the course to solidify your understanding of the process.
Show steps
  • Design the pipeline architecture.
  • Gather and prepare the data.
  • Train the machine learning model.
  • Evaluate the model's performance.
  • Deploy the model to production.
Participate in a Workshop on Advanced Azure Databricks Techniques
Enhance your skills and knowledge through a workshop led by experts, providing a structured and immersive learning experience.
Browse courses on Azure Databricks
Show steps
  • Identify workshops that align with your learning goals.
  • Register for and attend the workshop.
  • Actively participate in the activities and discussions.
Create a Cheat Sheet for Model Selection Techniques
Summarize the model selection techniques covered in the course in a cheat sheet to improve your recall and comprehension.
Browse courses on Model Selection
Show steps
  • List the different model selection techniques.
  • Describe the pros and cons of each technique.
  • Provide examples of when each technique is appropriate.
Explore Advanced Features of Apache Spark for Data Engineering
Expand your knowledge of Apache Spark by exploring advanced features, enhancing your ability to handle large-scale data processing challenges.
Browse courses on Apache Spark
Show steps
  • Identify additional Spark features that align with your interests.
  • Find tutorials or documentation on those features.
  • Follow the tutorials to gain hands-on experience.
Contribute to Open Source Projects Related to Azure Databricks
Apply your skills to contribute to open-source projects, gaining practical experience while supporting the Azure Databricks community.
Browse courses on Apache Spark
Show steps
  • Identify open-source projects related to Azure Databricks.
  • Choose an area to contribute to.
  • Submit your contributions and engage with the project community.

Career center

Learners who complete Perform data science with Azure Databricks will develop knowledge and skills that may be useful to these careers:
Data Scientist
**Data Scientists** use their knowledge of machine learning and data mining to extract knowledge from large amounts of data. They develop and apply statistical and machine learning models to solve business problems. This course can help you become a Data Scientist by teaching you how to use Apache Spark and Azure Databricks to process and analyze large datasets. You will also learn how to build and deploy machine learning models using Azure Machine Learning.
Machine Learning Engineer
**Machine Learning Engineers** design, develop, and maintain machine learning systems. They work closely with Data Scientists to translate machine learning models into production-ready systems. This course can help you become a Machine Learning Engineer by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to build and deploy machine learning models using Azure Machine Learning.
Data Engineer
**Data Engineers** design, build, and maintain data pipelines. They work with Data Scientists and Machine Learning Engineers to ensure that data is available in a timely and reliable manner. This course can help you become a Data Engineer by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to build and deploy machine learning models using Azure Machine Learning.
Data Analyst
**Data Analysts** use data to solve business problems. They collect, clean, and analyze data to identify trends and patterns. This course can help you become a Data Analyst by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Business Analyst
**Business Analysts** use data to make informed business decisions. They work with stakeholders to identify business needs and develop solutions. This course can help you become a Business Analyst by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Software Engineer
**Software Engineers** design, develop, and maintain software systems. They work on a variety of projects, from small applications to large-scale enterprise systems. This course can help you become a Software Engineer by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Statistician
**Statisticians** use data to make informed decisions. They work on a variety of projects, from analyzing clinical trials to forecasting economic trends. This course can help you become a Statistician by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Operations Research Analyst
**Operations Research Analysts** use mathematical and analytical techniques to solve business problems. They work on a variety of projects, from optimizing supply chains to designing healthcare systems. This course can help you become an Operations Research Analyst by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Data Architect
**Data Architects** design and manage data systems. They work with stakeholders to identify data needs and develop solutions. This course can help you become a Data Architect by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Database Administrator
**Database Administrators** manage and maintain databases. They work on a variety of projects, from installing and configuring databases to performance tuning. This course can help you become a Database Administrator by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
IT Manager
**IT Managers** plan and direct the activities of an organization's IT department. They work with stakeholders to identify technology needs and develop solutions. This course can help you become an IT Manager by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Project Manager
**Project Managers** plan, organize, and execute projects. They work with stakeholders to identify project goals and develop plans. This course can help you become a Project Manager by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Financial Analyst
**Financial Analysts** use data to make informed investment decisions. They work with a variety of clients, from individuals to large institutions. This course can help you become a Financial Analyst by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Market Research Analyst
**Market Research Analysts** collect and analyze data about markets and consumers. They work with a variety of clients, from businesses to government agencies. This course can help you become a Market Research Analyst by teaching you how to use Azure Databricks to process and analyze large datasets. You will also learn how to use Azure Machine Learning to build and deploy machine learning models.
Business Development Manager
**Business Development Managers** identify and develop new business opportunities. They work with a variety of clients, from small businesses to large corporations. This course may be useful for you if you are interested in a career in business development. It can help you develop the skills you need to identify and develop new business opportunities, such as data analysis and machine learning.

Reading list

We've selected nine books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Perform data science with Azure Databricks.
Provides a foundational understanding of Apache Spark, which is the underlying engine used by Azure Databricks. It serves as a good starting point for those new to Spark.
Provides a comprehensive overview of Apache Spark, the underlying engine used by Azure Databricks. It can serve as a reference for more technical concepts and implementation details of Spark.
Provides an in-depth exploration of advanced analytical techniques with Apache Spark. It covers topics such as graph processing, machine learning, streaming analytics, and performance optimization.
Introduces the concept of data mesh, which distributed data architecture style that aligns well with the decentralized nature of Azure Databricks.
Provides a comprehensive introduction to supervised machine learning with Python, covering a wide range of topics, from data preparation and feature engineering to model training and evaluation.
Provides a comprehensive introduction to pattern recognition and machine learning, covering a wide range of topics, from supervised learning and unsupervised learning to deep learning.
Provides a comprehensive introduction to deep learning, covering a wide range of topics, from neural networks and deep learning algorithms to distributed deep learning.
Provides a comprehensive introduction to data science, covering a wide range of topics, from data preparation and feature engineering to model training and evaluation.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Perform data science with Azure Databricks.
Build and Operate Machine Learning Solutions with Azure
Most relevant
Microsoft Azure Machine Learning for Data Scientists
Most relevant
Prepare for DP-100: Data Science on Microsoft Azure Exam
Most relevant
Create Machine Learning Models in Microsoft Azure
Most relevant
Data Literacy: Essentials of Azure Databricks
Most relevant
Optimizing Microsoft Azure AI Solutions
Most relevant
Implementing an Azure Databricks Environment in Microsoft...
Most relevant
Microsoft Azure Databricks for Data Engineering
Most relevant
Operationalizing Microsoft Azure AI Solutions
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser