We may earn an affiliate commission when you visit our partners.
Course image
Course image
edX logo

Introduction to Designing Data Lakes on AWS

Rafael Lopes and Morgan Willis

Designing a data lake is challenging because of the scale and growth of data. Developers need to understand best practices to avoid common mistakes that could be hard to rectify. In this course we will cover the foundations of what a Data Lake is, how to ingest and organize data into the Data Lake, and dive into the data processing that can be done to optimize performance and costs when consuming the data at scale. This course is for professionals (Architects, System Administrators and DevOps) who need to design and build an architecture for secure and scalable Data Lake components. Students will learn about the use cases for a Data Lake and, contrast that with a traditional infrastructure of servers and storage.

What's inside

Learning objectives

  • Where to start with a data lake?
  • How to build a secure and scalable data lake?
  • What are the common components of a data lake?
  • Why do you need a data lake and what it's value?

Syllabus

Week 1: Hello World, I mean, Hello Data Lakes!
Video: Meet the Instructors
Video: Introduction to Week 1
Video: Why Data Lakes?
Read more
Video: Characteristics of a Data Lake
Video: Data Lake Components
Reading: Data Lake Characteristics and Components
Video: Comparison of a Data Lake to a Data Warehouse
Reading: Data Lakes and Data Warehouses
Video: Discussing sample Data Lake Architectures
Quiz/Assessment: Week 1 quiz
Week 2: AWS data related services
Video: Introduction to Week 2
Video: AWS Data Lake related services
Video: Amazon S3
Video: AWS Glue Data Catalog
Reading: S3 and Glue Data Catalog
Video: AWS Services used for data movement
Reading: Kinesis, API Gateway, etc
Video: AWS Services for Data processing
Video: AWS Services for Analytics
Video: AWS Services used for Predictive Analytics and Machine Learning
Reading: EMR, Glue Jobs, Lambda, Kinesis Analytics, Redshift
Video: Introduction to AWS LakeFormation
Reading: LakeFormation
Lab: Get familiar with AWS Services and create your first simple data lake
Week 3: Ingesting the rivers
Video: Introduction to Week 3
Video: Use the right tool for the job
Video: Understanding Data Structure and when to process data
Video: Data Streaming ingestion with Amazon Kinesis Services
Video: Diving Deep on Amazon Kinesis
Demo: Batch Data Ingestion with AWS Transfer Family
Reading: Batch Data Ingestion with AWS Services
Video: Data Cataloging
Demo: Using Glue Crawlers
Reading: The importance of data cataloging
Video: Reviewing the ingestion part of some Data Lake architectures
Lab: Ingesting Web Logs
Week 4: Processing and Analyzing data that sits in the Data Lake
Video: Introduction to Week 4
Video: Data prep and AWS Glue jobs
Video: File optimizations
Demo: Using S3, Glue and Athena to get insights about NYC Taxi data
Reading: Glue Jobs, Data Prep, Athena? Columnar Data Formats and Amazon Athena Optimizations
Video: Introduction to Data Lake security
Reading: Security and compliance
Video: The power of data visualization
Video: Introduction to Amazon QuickSight
Demo: Amazon Quicksight
Reading: Data visualization, Amazon QuickSight
Video: Registry of Open Data on AWS
Lab: Create an end-to-end Data Lake with AWS Services
Video: Course wrap-up!

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Examines data lake foundations, ingestion, and data processing necessary to optimize performance and costs
Taught by Morgan Willis and Rafael Lopes, instructors with experience in data lake design and architecture
Introduces best practices for avoiding common mistakes in data lake design
Provides hands-on experience through labs, such as ingesting web logs and creating an end-to-end data lake with AWS services
Covers various AWS data services, including S3, Glue Data Catalog, Kinesis, EMR, Glue Jobs, and LakeFormation
Suitable for professionals, including architects, system administrators, and DevOps engineers, who need to design and build data lake components

Save this course

Save Introduction to Designing Data Lakes on AWS to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Introduction to Designing Data Lakes on AWS with these activities:
Review SQL for Data Analytics
Refresh your knowledge of SQL to enhance your ability to query and analyze data stored in the Data Lake.
Browse courses on Data Analysis
Show steps
  • Review online tutorials on SQL for data analytics
  • Practice writing SQL queries using an online SQL editor
Revisit concepts of data modeling
Refresh your understanding of data modeling to improve your ability to design and implement an effective Data Lake schema.
Browse courses on Data Modeling
Show steps
  • Review your notes from a previous data modeling course or tutorial
  • Read articles or blog posts about data modeling best practices
Organize and review course materials
Organize and review course materials to enhance your understanding and retention of key concepts.
Show steps
  • Create a dedicated folder or notebook for course materials
  • Organize materials by topic or module
  • Regularly review and summarize key concepts
Two other activities
Expand to see all activities and additional details
Show all five activities
Attend a virtual study group
Join a virtual study group to connect with classmates, discuss course concepts, and enhance your learning experience.
Show steps
  • Find a virtual study group or create your own
  • Meet regularly with your study group to review course material, work on assignments, and prepare for exams
Practice data ingestion techniques
Practice with different data ingestion tools and techniques to improve your understanding of how data is brought into a Data Lake.
Browse courses on Data Ingestion
Show steps
  • Review the different data ingestion methods supported by AWS
  • Experiment with ingesting data using AWS Kinesis Services
  • Explore the use of AWS Transfer Family for batch data ingestion

Career center

Learners who complete Introduction to Designing Data Lakes on AWS will develop knowledge and skills that may be useful to these careers:
Data Engineer
Data Engineers design, build, and maintain data pipelines and systems. They may also work on data quality and data governance. A background in data lakes is essential for Data Engineers, as they are often responsible for managing and processing data in data lakes. This course can help aspiring Data Engineers build a solid foundation in designing and managing data lakes on AWS.
Data Warehouse Engineer
Data Warehouse Engineers design, build, and maintain data warehouses. They may also work on data lakes. They ensure that data is stored and processed efficiently, and that data is accessible to users. This course may be useful for aspiring Data Warehouse Engineers as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Business Analyst
Business Analysts use data to solve business problems. They may use data lakes to store and process large amounts of data. This course may be useful for aspiring Business Analysts as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Data Analyst
Data Analysts collect, clean, and analyze data to identify trends and patterns. They may use data lakes to store and process large amounts of data. This course may be useful for aspiring Data Analysts as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Data Governance Analyst
Data Governance Analysts develop and implement data governance policies and procedures. They may also work on data lakes to ensure that data is managed in a consistent and compliant manner. This course may be useful for aspiring Data Governance Analysts as it covers the basics of designing and managing data lakes, as well as data security and compliance.
Data Scientist
Data Scientists use data to solve business problems. They may use data lakes to store and process large amounts of data. This course may be useful for aspiring Data Scientists as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Cloud Architect
Cloud Architects design and manage cloud computing systems. This may include designing and managing data lakes. They ensure that cloud systems are running smoothly and that data is secure and accessible. This course may be useful for aspiring Cloud Architects as it covers the fundamentals of designing and managing data lakes on AWS.
DevOps Engineer
DevOps Engineers bridge the gap between development and operations teams. They may work on data lakes to ensure that data is flowing smoothly between different systems. This course may be useful for aspiring DevOps Engineers as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Security Analyst
Security Analysts identify and mitigate security risks. They may work on data lakes to ensure that data is secure and compliant with regulations. This course may be useful for aspiring Security Analysts as it covers the basics of designing and managing data lakes, as well as data security and compliance.
Project Manager
Project Managers oversee the development and execution of projects. They may work on data lakes to ensure that data is flowing smoothly between different systems and that projects are completed on time and within budget. This course may be useful for aspiring Project Managers as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Product Manager
Product Managers oversee the development and launch of new products. They may work on data lakes to ensure that data is flowing smoothly between different systems and that products are meeting customer needs. This course may be useful for aspiring Product Managers as it covers the basics of designing and managing data lakes, as well as data processing and analytics.
Database Administrator
Database Administrators manage and maintain databases. This may include managing data lakes. They ensure that databases are running smoothly and that data is secure and accessible. This course may be useful for aspiring Database Administrators as it covers the fundamentals of designing and managing data lakes, as well as data security and compliance.
Data Architect
Data Architects create and manage the architecture of data systems. This may include designing and managing data lakes. They plan and design data management systems, ensuring that data is accessible, secure, and compliant with regulations. This course may be useful for aspiring Data Architects as it covers the fundamentals of designing and building data lakes, including data ingestion, organization, and processing.
Software Engineer
Software Engineers design, build, and maintain software applications. This may include designing and managing data lakes. They ensure that applications are running smoothly and that data is secure and accessible. This course may be useful for aspiring Software Engineers as it covers the fundamentals of designing and managing data lakes, as well as data security and compliance.
Systems Engineer
Systems Engineers design, build, and maintain computer systems. This may include designing and managing data lakes. They ensure that systems are running smoothly and that data is secure and accessible. This course may be useful for aspiring Systems Engineers as it covers the fundamentals of designing and managing data lakes, as well as data security and compliance.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Introduction to Designing Data Lakes on AWS.
Provides a comprehensive guide to designing and building data-intensive applications, including discussions on data lakes. Useful for understanding the broader context of data lake design and implementation.
Provides a comprehensive guide to designing and implementing a data mesh architecture, which is an alternative approach to data lake architectures. Useful for understanding the pros and cons of different data lake architectures.
Provides a comprehensive guide to data governance, a critical aspect of data lake management. Useful for data governance professionals and stakeholders involved in ensuring the quality and integrity of data lakes.
Provides a comprehensive overview of data lakes, including their benefits, challenges, and best practices. Useful for beginners who need a foundational understanding of data lakes.
Provides a practical guide to big data analytics, including discussions on data lakes. Useful for beginners who need a general understanding of big data analytics and its applications.
Provides a practical guide to using MapReduce for data-intensive text processing, a common task in data lake environments. Useful for data engineers and analysts involved in processing and analyzing large amounts of text data.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Introduction to Designing Data Lakes on AWS.
Introduction to Designing Data Lakes on AWS
Most relevant
Implement Security on Azure Data Lakes
Implement Data Auditing with Azure Data Lake
Scale and Deploy LLMs in Production Environments
Firebase Authentication 7 and Cloud Storage
Node.js Microservices: Advanced Topics and Best Practices
Microsoft Azure Developer: Implementing Data Lake Storage...
Improving Azure Data Lake Performance
Amazon S3 Deep Dive
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser