Database Engineering is a very interesting sector in software engineering. If you are interested in learning about database engineering you have come to the right place. I have curated this course carefully to discuss the Fundamental concepts of database engineering.
Database Engineering is a very interesting sector in software engineering. If you are interested in learning about database engineering you have come to the right place. I have curated this course carefully to discuss the Fundamental concepts of database engineering.
This course will not teach you SQL or programming languages, however, it will teach you skillsets and patterns that you can apply in database engineering. A few of the things that you will learn are Indexing, Partitioning, Sharding, Replication, b-trees in-depth indexing, Concurrency control, database engines and security, and much more.
I believe that learning the fundamentals of database engineering will equip you with the necessary means to tackle difficult and challenging problems yourself. I always compare engineering to math, you never memorize specific formulas and equations, you know the basic proves and derive and solve any equation one throws at you. Database engineering is similar, you can't possibly say MongoDB is better than MySQL or Postgres is better than Oracle. Instead, you learn your use case and by understanding how each database platform does its own trade-offs you will be able to make optimal decisions.
One other thing you will learn in this course is the lowest database interface that talks to the OS which is the database engine. Database engines or storage engines or sometimes even called embedded databases is a software library that a database management software uses to store data on disk and do CRUD (create update delete) Embedded means move everything in one software no network client-server. In this video course, I want to go through the few popular database engines, explain the differences between them, and finally, I want to spin up a database and change its engine and show the different features of each engine.
Enjoy the course.
ACID which stands for Atomicity, consistency, isolation, and durability are four critical properties of relational database. I think any engineer working with a relational database like postgres, mysql, sqlserver oracle, should understand these properties.
In this course, we will go through the four properties and explain why each is critical to build and use a relational database successfully.
In this video we will demonstrate Atomicity, Isolation, Consistency and Durability on Postgres, fully practical example.
Answer the following questions about ACID properties in databases
This lecture details the inner working of database systems with regards to storage. It is a must watch to understand the difference between tables, pages, IO, rows, indexes and data files.
In this lecture I will discuss the difference between Primary Key and a Secondary Key and how it can affect your performance.
Lots of you asked me how to create a table with millions of rows in postgres, here are the details
In this video, I explain the benefits of Bitmap Index Scan and how it differs from Index scan and table sequential scan.
If you create an index on a large production table in postgres, the operations blocks writes in order to make sure to pull all the field entries to the index. However most of the time you can't afford to block writes on an active production database table. Postgres new feature which allows create index concurrently allows writes and reads to go in the expense of cpu/memory, time and chance for the index to be invalid. A small price to pay for fast production writes! https://www.postgresql.org/docs/9.1/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. However, most contents explain this data structure from a theoretical point of view, in this lecture I’d like to shed some light on the practical considerations of B-Tree and B+Trees in real production database systems such as Postgres and MySQL.
Link to the original paper https://infolab.usc.edu/csci585/Spring2010/den_ar/indexing.pdf
B-Tree limitation
Assume you have a table that is partitioned on the customer_id field serial 32bit, and you want to partition by range, how do you create all the necessary partitions? this is what I discuss in this video
Source Code
https://github.com/hnasr/javascript_playground/tree/master/automate_partitions
In this lecture we explain the difference between exclusive (write locks) and shared locks (read locks)
In this video, I demonstrate how is it possible to get double booking in database-backed web applications and how to prevent double booking and race conditions with row-level locks.
Source Code https://github.com/hnasr/javascript_playground/tree/master/booking-system
In this video I’ll explain why you should avoid using SQL offset when implementing any kind of paging. I’ll explain what offset does, why is it slow and what is the alternative for better performance This video is inspired by Use the index luke, i’ll have a link to the blog and slides to learn more. Let say you have a web application with an API that supports paging, you user want to request 10 news articles in page 10, this is performed via a simple GET request as shown here The API server receives the GET request and builds the SQL in order to send it to the database hopefully a pool of connections exist here. Page 10 translates to offset 100 assuming each page has 10 records and now the database is ready to execute the query against the table. Offset by design means fetch and drop the first x number of rows, so in this case the database will fetch the first 110 rows and physically drop the first 100 leaving the limit of 10 which the user will get. As the offset increase, the database is doing more work which makes this operation extremely expensive. Furthermore, the problem with offset is you might accidentally read duplicate records. consider the user now want to read page 11 and meanwhile someone inserted a new row in the table, row 111 will be read twice Let us jump and test this against postgres
Use the Index Luke Blog https://use-the-index-luke.com/no-offset
Slides in this video https://payhip.com/b/B6o1
Connection pooling is a pattern of creating a pool of available connections (usually TCP) and allow multiple clients to share the same pool of connections. This pattern is usually used when connection establishment and tearing down is costly, and the server has a limited number of connections. In this video we will learn how to use connection pooling in NodeJs when working with a Postgres Database, we will learn how to spin up a pool of database connections and use stateless pool queries and transactional queries begin/end, and finally, we will
Node JS Source Code used in this lecture here https://github.com/hnasr/javascript_playground/tree/master/postgresnode-pool
Scripts and commands
docker run --name pgmaster -v /Users/HusseinNasser/postgres/v/master_data:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres
docker run --name pgstandby -v /Users/HusseinNasser/postgres/v/standby_data:/var/lib/postgresql/data -p 5433:5432 -e POSTGRES_PASSWORD=postgres -d postgres
In standby node update postgresql.conf
primary_conninfo = 'application_name=standby host=husseinmac port=5432 user=postgres password=postgres’
add file standby.signal
touch standby.signal
In master update postgresql.conf
first 1 (standby1)
select * from pg_stat_replication
Scripts and commands
docker run --name pgmaster -v /Users/HusseinNasser/postgres/v/master_data:/var/lib/postgresql/data -p 5432:5432 -e POSTGRES_PASSWORD=postgres -d postgres
docker run --name pgstandby -v /Users/HusseinNasser/postgres/v/standby_data:/var/lib/postgresql/data -p 5433:5432 -e POSTGRES_PASSWORD=postgres -d postgres
In standby node update postgresql.conf
primary_conninfo = 'application_name=standby host=husseinmac port=5432 user=postgres password=postgres’
add file standby.signal
touch standby.signal
In master update postgresql.conf
first 1 (standby1)
select * from pg_stat_replication
If using the hostname doesn't work, use the IP address of the container itself. You can get the local IP address of the container by running docker inspect container name
We got through a practical system design exercises, this lecture is two parts. Part 1 is all about backend engineering and scaling and Part 2 focuses on database design.
Database engines or storage engines or sometimes even called embedded databases is software library that a database management software uses to store data on disk and do CRUD (create update delete)
Resources
https://youtu.be/V_C-T5S-w8g
https://mariadb.com/kb/en/library/changes-improvements-in-mariadb-102/
https://mariadb.com/kb/en/library/why-does-mariadb-102-use-innodb-instead-of-xtradb/
https://github.com/facebook/rocksdb/wiki/Features-Not-in-LevelDB
https://mariadb.com/kb/en/library/aria-storage-engine/
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html https://eng.uber.com/mysql-migration/
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.