We may earn an affiliate commission when you visit our partners.
Course image
Joe Reis

In this course, you'll learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different data systems, including object stores, block storage, file systems, and databases, that are built on top of these raw ingredients. You’ll also get a chance to query a Neo4j graph database and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses to data lakes and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm.

Read more

In this course, you'll learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different data systems, including object stores, block storage, file systems, and databases, that are built on top of these raw ingredients. You’ll also get a chance to query a Neo4j graph database and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses to data lakes and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm.

With hands-on practice, you'll design a simple data lake using Amazon Glue, and build a data lakehouse using AWS LakeFormation and Apache Iceberg. In the last week of this course, you’ll see how queries work behind the scenes, practice writing more advanced SQL queries, compare the query performance in row vs column-oriented storage, and perform streaming queries using Apache Flink.

Enroll now

Two deals to help you save

We found two deals and offers that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Explores different data systems and their advantages and drawbacks
Provides hands-on experience designing a data lake and building a data lakehouse
Covers streaming queries using Apache Flink, which is highly relevant in industry
Taught by Joe Reis, who is recognized for his work in data storage and management
Develops foundational skills in data storage for learners new to the topic
Examines the evolution of data storage abstractions, which provides a comprehensive understanding of the field

Save this course

Save Data Storage and Queries to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Storage and Queries with these activities:
Form a Study Group
Engage with peers to discuss course concepts, review material, and support each other's learning.
Show steps
  • Work together on projects and assignments.
  • Find a group of students with complementary skills and interests.
  • Meet regularly to discuss the course material.
Data Wrangling Drills
Reinforce your understanding of data wrangling techniques and strengthen your ability to work with data effectively.
Browse courses on Data Wrangling
Show steps
  • Find a dataset and load it into a pandas DataFrame.
  • Perform data cleaning operations such as handling missing values, removing duplicates, and correcting data types.
  • Explore the data and identify patterns and insights.
Attend a Data Analytics Workshop
Deepen your understanding of data analysis techniques and tools through hands-on workshops.
Browse courses on Data Analytics
Show steps
  • Find a workshop that covers relevant topics.
  • Attend the workshop and participate actively.
  • Apply what you learn to your own projects.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Intro to Apache Spark
Expand your big data processing skills by exploring Apache Spark's capabilities for distributed computing.
Browse courses on Apache Spark
Show steps
  • Follow tutorials on setting up and using Apache Spark.
  • Create a Spark DataFrame and perform basic transformations.
  • Explore Spark's RDDs and their usage for parallel processing.
Review 'Designing Data-Intensive Applications'
Gain a comprehensive understanding of data-intensive application design and best practices.
View Secret Colors on Amazon
Show steps
  • Read chapters on data modeling, storage, and processing.
  • Identify key concepts and architectures for data-intensive applications.
Create a Data Glossary
Strengthen your data management skills by creating a data glossary for a specific domain or project.
Show steps
  • Identify the relevant data terms and definitions.
  • Organize the terms into a structured and logical framework.
  • Publish the data glossary for reference and use.
Design a Data Lake
Apply your knowledge of data storage and management by designing a data lake for a specific use case.
Browse courses on Data Lake
Show steps
  • Define the requirements and scope of your data lake.
  • Choose appropriate data storage technologies and tools.
  • Design the data lake architecture and data pipeline.

Career center

Learners who complete Data Storage and Queries will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Data Storage and Queries.
Perform Complex Search Functions in Kibana with Apache...
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Managing Big Data in Clusters and Cloud Storage
Most relevant
Querying Data with Snowflake
Most relevant
Streamline Data Queries with LangChain
Most relevant
Modeling Data Warehouses using Apache Hive
Most relevant
Distributed Computing with Spark SQL
Most relevant
Building ETL and Data Pipelines with Bash, Airflow and...
Most relevant
Improving Azure Data Lake Performance
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser