Sorry, this page is no longer available
Sorry, this page is no longer available
We may earn an affiliate commission when you visit our partners.
Course image
Joe Reis

In this course, you'll learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different data systems, including object stores, block storage, file systems, and databases, that are built on top of these raw ingredients. You’ll also get a chance to query a Neo4j graph database and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses to data lakes and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm.

Read more

In this course, you'll learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different data systems, including object stores, block storage, file systems, and databases, that are built on top of these raw ingredients. You’ll also get a chance to query a Neo4j graph database and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses to data lakes and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm.

With hands-on practice, you'll design a simple data lake using Amazon Glue, and build a data lakehouse using AWS LakeFormation and Apache Iceberg. In the last week of this course, you’ll see how queries work behind the scenes, practice writing more advanced SQL queries, compare the query performance in row vs column-oriented storage, and perform streaming queries using Apache Flink.

Enroll now

Here's a deal for you

Save money when you learn with a deal that may be relevant to this course.
All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Explores different data systems and their advantages and drawbacks
Provides hands-on experience designing a data lake and building a data lakehouse
Covers streaming queries using Apache Flink, which is highly relevant in industry
Taught by Joe Reis, who is recognized for his work in data storage and management
Develops foundational skills in data storage for learners new to the topic
Examines the evolution of data storage abstractions, which provides a comprehensive understanding of the field

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Comprehensive data storage and querying

According to students, this course offers a comprehensive and highly relevant overview of modern data storage and querying, particularly beneficial for data professionals. Learners praise its practical, hands-on labs with AWS services like Glue, LakeFormation, and Apache Iceberg, which are deemed incredibly practical for real-world application. The inclusion of up-to-date topics such as vector similarity search and data lakehouses is frequently highlighted as a major strength. While the course provides a strong foundation, some learners found that it covers a wide array of topics without sufficient depth for specialists, occasionally requiring supplemental resources for mastery. A few reviews also noted minor technical glitches or outdated lab instructions, which may cause slight frustration.
Offers a strong starting point but deeper learning requires external resources.
"It provides a good overview, but if you want to master Flink, you'll need additional resources."
"The course does provide a good starting point, but be prepared to do your own research for deeper dives."
"I had to supplement with external readings to fully grasp some of the more complex concepts."
Provides valuable practical experience with real-world AWS services.
"The labs were well-designed, allowing for practical application of concepts. I appreciated the practical examples with AWS services."
"The hands-on practice with AWS Glue and LakeFormation was incredibly practical. The hands-on sessions with real AWS services are a huge plus."
"I found the AWS hands-on labs to be beneficial, providing a solid opportunity to apply the concepts learned."
Focuses on up-to-date technologies and real-world applications.
"This course is absolutely fantastic for anyone looking to get a deeper understanding of modern data storage and query techniques. The content on data lakehouses with AWS LakeFormation and Apache Iceberg was particularly insightful and hands-on."
"The inclusion of modern topics like vector similarity search, which are key for current AI trends, and the practical application with AWS Glue and LakeFormation, are very valuable."
"I learned a lot that I can immediately apply in my job. The content is very up-to-date, covering lakehouses and vector similarity search which are critical topics today."
Some learners experienced minor glitches or outdated lab instructions.
"The labs were useful but sometimes had minor technical glitches that took time to debug."
"The labs often had outdated instructions or required specific AWS configurations not clearly explained."
"Some of the labs could be smoother, as I occasionally ran into issues that required troubleshooting beyond the course material."
Covers many topics but may lack the specialized depth some seek.
"The course has valuable information, but it covers too much ground without going deep enough into any one topic. For instance, the vector similarity search felt like a teaser."
"I found this course somewhat superficial. It touches on many interesting topics but doesn't provide the depth needed for a professional."
"The course provided a decent overview, but I felt it was too broad. It lacked the specific focus I was hoping for, and explanations for some complex topics were too high-level."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Storage and Queries with these activities:
Form a Study Group
Engage with peers to discuss course concepts, review material, and support each other's learning.
Show steps
  • Work together on projects and assignments.
  • Find a group of students with complementary skills and interests.
  • Meet regularly to discuss the course material.
Data Wrangling Drills
Reinforce your understanding of data wrangling techniques and strengthen your ability to work with data effectively.
Browse courses on Data Wrangling
Show steps
  • Find a dataset and load it into a pandas DataFrame.
  • Perform data cleaning operations such as handling missing values, removing duplicates, and correcting data types.
  • Explore the data and identify patterns and insights.
Attend a Data Analytics Workshop
Deepen your understanding of data analysis techniques and tools through hands-on workshops.
Browse courses on Data Analytics
Show steps
  • Find a workshop that covers relevant topics.
  • Attend the workshop and participate actively.
  • Apply what you learn to your own projects.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Intro to Apache Spark
Expand your big data processing skills by exploring Apache Spark's capabilities for distributed computing.
Browse courses on Apache Spark
Show steps
  • Follow tutorials on setting up and using Apache Spark.
  • Create a Spark DataFrame and perform basic transformations.
  • Explore Spark's RDDs and their usage for parallel processing.
Review 'Designing Data-Intensive Applications'
Gain a comprehensive understanding of data-intensive application design and best practices.
View Secret Colors on Amazon
Show steps
  • Read chapters on data modeling, storage, and processing.
  • Identify key concepts and architectures for data-intensive applications.
Create a Data Glossary
Strengthen your data management skills by creating a data glossary for a specific domain or project.
Show steps
  • Identify the relevant data terms and definitions.
  • Organize the terms into a structured and logical framework.
  • Publish the data glossary for reference and use.
Design a Data Lake
Apply your knowledge of data storage and management by designing a data lake for a specific use case.
Browse courses on Data Lake
Show steps
  • Define the requirements and scope of your data lake.
  • Choose appropriate data storage technologies and tools.
  • Design the data lake architecture and data pipeline.

Career center

Learners who complete Data Storage and Queries will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser