Are you ready to take your career to the next level?
Do you want to master Software Architecture and System Design?
You came to the right place.
In this practical course, you will learn how to architect real-life systems that scale to millions of daily users, as well as process and store petabytes of data.
If you aspire to become a Software Architect, or you are already a Software Architect, and you need a good refresher, this is your best resource.
This is also the perfect place for you to prepare and gain confidence for an upcoming System Design Interview.
Are you ready to take your career to the next level?
Do you want to master Software Architecture and System Design?
You came to the right place.
In this practical course, you will learn how to architect real-life systems that scale to millions of daily users, as well as process and store petabytes of data.
If you aspire to become a Software Architect, or you are already a Software Architect, and you need a good refresher, this is your best resource.
This is also the perfect place for you to prepare and gain confidence for an upcoming System Design Interview.
Some of the things you will learn include:
Identifying the technical requirements of the systems without missing any details
Defining easy-to-use and robust APIs
Applying modern Architectural Building Blocks & techniques for High Scalability, Availability, and Performance
Following industry-proven Software Architecture Patterns & best practices
Architecting highly scalable systems for massive internet traffic and Big Data Processing
Thinking and making trade-offs like a true professional Software Architect
By the end of the course, you will have all the skills you need to take on an ambiguous and high-level requirement and go through all the stages of a system design, all the way to its final Software Architecture.
Although this course does not involve coding, it is a highly practical course that will give you the fundamental knowledge for building real-world systems.
All the techniques and patterns covered in the course are used by top software companies.
In addition to the video lectures, you will also find
Many resources related to the topics covered in the course.
Quizzes that will help you validate your progress and review the course material
External links to relevant articles and videos to enhance your learning experience
This course is perfect for you if:
You want to master Software Architecture, a topic that is not usually taught in colleges or coding bootcamps
You want to become a Software Architect or a senior member of technical staff, such as a Senior / Principal Software Engineer or Technical Lead.
You are preparing for a System Design Interview and want to increase your chances for success, as well as stand out from the crowd of candidates.
So what are you waiting for? :)
Let's get started.
FAQ
- Do I need to be a Software Architect to take this course?
Absolutely not. A Software Architect is just a title. In fact, many companies don't give this official title to anyone. Most Software Architecture and System Design is done by trusted engineers within the organization. To get this trust from your managers, you need to demonstrate a solid knowledge of Software Architecture and System Design. This is exactly what you will learn in this course.
- What is the importance of Software Architecture, and why do I need to learn it?
Modern software development of large-scale systems is very complex. Typically it involves many months of work by multiple software engineers. Just like no one would attempt to build a skyscraper without a solid plan and architecture, it is inconceivable to take on a big software project without proper design and an approved Software Architecture. If the Software Architecture of a system is done poorly, the project will likely fail. However, if the software architecture is done correctly, it can have an enormous positive impact on many lives and help your organization grow and thrive.
- Is there any coding involved in the course?
No. Software Architecture is part of the design phase of a large-scale system. Coding is done only when the Software Architecture is finalized. It definitely takes a certain mental leap to realize that coding is only a small part of software engineering. And if the Software Architecture and the design is done correctly, the coding task and everything that comes after it can be very easy and straightforward. On the other hand, if the Software Architecture is not done correctly, the implementation phase can become a big challenge.
- Should any Software Engineer aspire to become a Software Architect?
Yes and No. As you gain more experience, you will be expected to do more Software Architecture and Design. A role of a Senior Software Engineer in most organizations requires Software Architecture skills, even if your role does not have the "Software Architect" in it. Additionally, even if you want to keep coding, your code will always have to take the overall Software Architecture into account. Otherwise, it will not be as effective. However, if you do decide to pursue the role of a Software Architect, you will be rewarded with greater responsibly and impact, which generally comes with higher job satisfaction, job security, and higher pay.
This lecture provides an introduction to the importance of software architecture, starting with analogies from the physical world to explain why structure matters. The lecture defines software architecture as a high-level description of a system's structure, components, and how they communicate to meet requirements and constraints. The components are black box elements defined by behavior and APIs, which could themselves be complex systems described by their own architecture diagrams. The lecture emphasizes that software architecture should be a separate consideration from implementation details like technology and programming language. The lecture also highlights the importance of good architecture in large-scale systems, which can mean the difference between success and failure for a business.
The lecture discusses the importance of gathering, classifying, and analyzing requirements as the first step in designing large-scale systems. The scope and level of abstraction, as well as the level of ambiguity, make gathering requirements for designing a large-scale system different from those required to implement a method, algorithm, or class. The requirements are classified into three types: features of the system, quality attributes, and system constraints, which have different effects on the architecture and design of the system. The features of the system refer to the behavior of the system and are easily tied to the system's objective. The quality attributes refer to the non-functional requirements that describe the system's performance and user experience, while the system constraints refer to the system's limitations and boundaries. Getting the requirements right upfront is critical for building large-scale systems because these systems take months to build, require a lot of engineers and sometimes teams, and involve contracts with time commitments and financial obligations.
In this lecture, the formal step-by-step method to capture all functional requirements of a system is discussed. This is achieved through use cases, and user flows. Use cases represent a particular scenario or situation where the system is used to achieve a user’s goal. User flows are a more detailed graphical or step-by-step representation of each use case. The steps for capturing all functional requirements include identifying all actors/users in the system, describing all possible use cases, and expanding each use case through the events or interactions between actors and the system. An example sequence diagram is used to illustrate the interactions between the driver, rider, and the hitchhiking system.
The lecture explores quality attributes, also known as nonfunctional requirements, which describe the overall properties of a software system. Quality attributes measure how well the system performs on a particular dimension and have a direct effect on the software architecture. The lecture highlights the importance of designing a system that provides the right quality attributes to avoid major redesigns. Quality attributes need to satisfy the requirements of all stakeholders and must be measurable and testable. However, there is no single architecture that can provide all quality attributes, as some contradict each other, so architects must prioritize and design systems accordingly.
The lecture discusses system constraints, which are non-negotiable decisions that limit the degrees of freedom for architects when designing software architecture. System constraints are classified into three types: technical constraints, business constraints, and legal constraints. Technical constraints include limitations such as programming language, technology, or platforms due to existing contracts, support costs, or client requests. Business constraints are decisions made by business teams that require architects to make sacrifices in terms of architecture and implementation to align with the goals of the business. Legal constraints include regulations that place certain limitations on online services, such as the HIPAA regulations for medical information in the US or GDPR in the European Union.
This lecture talks about the importance of performance as a quality attribute in software architecture and the various performance metrics that can be used to measure it, such as response time and throughput. The response time is the time between a client sending a request and receiving a response, which is broken down into two parts: processing time and waiting time. Waiting time is the duration spent inactively in our system, waiting to be handled or sent to its destination. Throughput is measured by the amount of work performed by the system per unit of time. The lecture also highlights important considerations when measuring and analyzing performance, such as measuring the response time as perceived by the client and understanding the response time distribution.
This lecture explains the concept of the scalability of a system, which is one of the most important quality attributes. The lecturer describes the different traffic patterns that affect the load on a system and motivates the need for scalability. Scalability is defined as the measure of a system's ability to handle a growing amount of work by adding resources to the system. There are three scalability dimensions: vertical scalability, horizontal scalability, and team/organizational scalability. The lecture details the definitions and differences between vertical and horizontal scalability. Vertical scalability means adding more resources or upgrading the existing resources on a single computer. In comparison, horizontal scalability means adding more resources in the form of new instances running on different machines.
This lecture talks about the importance of high availability, how to define and measure it, and what constitutes high availability of a system. Availability is one of the most important quality attributes of a large-scale system. System downtime is not always an inconvenience; in mission-critical services such as air traffic control or healthcare management, people's lives may be on the line. The definition of availability is the fraction of time or the probability that a service is operationally functional and accessible to the user. MTBF and MTTR are two other statistical metrics used to define and estimate availability.
This lecture discusses the importance of fault tolerance in achieving the high availability of a system, as well as strategies to achieve it. The lecture begins by outlining the three categories of failure that can occur in a system: human error, software error, and hardware failure. To achieve fault tolerance, there are three major tactics: failure prevention, failure detection, and isolation and recovery. To prevent failures, the lecture recommends eliminating any single point of failure by employing replication and redundancy. Two strategies, active-active and active-passive, are discussed in the lecture, each with its advantages and disadvantages. The lecture also covers failure detection and isolation and how a monitoring service can be employed to detect failures.
The lecture is about the three key terms that help aggregate the promises made to users about quality attributes. The first term, Service Level Agreement (SLA), is a legal contract between the service provider and users that promises quality service in terms of availability, performance, and other metrics. The agreement specifies penalties if the provider fails to deliver, and may also exist for internal users of the service. The second term, Service Level Objective (SLO), represents a specific target value or range of values that the service should meet and should be testable and measurable. The third term, Service Level Indicator (SLI), is a quantitative measure of compliance with an SLO. It enables comparisons between actual measurements and goals and validates that the service is meeting its SLOs. The software engineers and architects are responsible for defining the SLOs and SLIs, with careful consideration given to prioritizing SLOs that are important to users.
This lecture discusses the design of application programming interfaces (APIs) for large-scale systems. APIs serve as a contract between system engineers and client applications. They allow applications to call other systems remotely through a network. APIs can be classified into three groups: public APIs, private APIs, and partner APIs. APIs should be designed to be easy to use, understand, and impossible to misuse. They should also be encapsulated from the internal system design and implementation to allow for changes in the future without breaking existing client contracts. In addition, it is important to keep operations idempotent whenever possible.
The lecture explains Remote Procedure Calls (RPC), which is a type of API that allows client applications to execute subroutines on remote servers, giving developers the ability to call a method as if it was a local method. The lecturer explains that the interface and data types are declared using a special interface description language, and the API methods are auto-generated for the server and client application. The client stub serializes the data and initiates the connection to the remote server application. In contrast, the server stub listens to client application messages and invokes the real implementation of the method on the server application. The lecturer concludes by discussing the benefits and drawbacks of RPC and how to mitigate the performance bottleneck.
The lecture provides an overview of RESTful API, explaining its architectural style and highlighting its benefits for building web-based systems with high performance, scalability, and availability quality attributes. REST stands for Representational State Transfer and is a set of constraints and best practices for defining APIs for the web. In contrast to the RPC API style, which revolves around methods, REST API takes a more resource-oriented approach where the main abstraction to the user is a named resource and not a method. Resources are addressed using a URI and are organized in a hierarchy. The lecture emphasizes the importance of statelessness and cacheability in achieving high scalability and availability.
This lecture introduces load balancers, which are essential building blocks for real-life, large-scale systems. A load balancer's primary role is to distribute the traffic load evenly among a group of servers in a system. Load balancers offer high scalability and availability by hiding a group of servers behind them, which allows horizontal scaling, auto-scaling policies, and ignoring dead or excessively slow servers. Load balancers increase system performance and throughput while maintaining service level agreements. The lecture describes different types of load balancers, including DNS, network, hardware, and application load balancers, and their pros and cons.
The lecture introduces the fundamental architecture building block for asynchronous architectures, the message broker. The lecture begins by exploring some use cases where an asynchronous architecture can provide more benefits and better capabilities than synchronous communication. The first drawback of synchronous communication is the fact that both application instances that establish communication with each other have to remain healthy and maintain this connection to complete the transaction. The second drawback of synchronous communication is that there is no padding in the system to absorb a sudden increase in traffic or load. Message brokers can provide additional functionality such as message routing, transformation, validation, and even load balancing. Message brokers are the fundamental building block for any asynchronous software architecture.
The API Gateway is a fundamental architectural building block and design pattern used in large-scale systems to solve the complexity issues that come with scaling. It serves as an API management service that abstracts and unifies a collection of backend services, simplifying the external API. The Gateway follows a software architectural pattern called API composition, allowing it to compose different APIs of backend services into a single external API. The benefits of the API Gateway include easy internal changes to the system, improved security and authentication, request routing to improve system performance, and caching static content to reduce response time.
The lecture discusses Content Delivery Networks (CDNs), a globally distributed network of servers used by digital service companies to improve the speed and availability of their content to end-users. Despite distributed web hosting, there is significant latency between an end-user and a destination server, with multiple network router hops. This results in a delay in the arrival of assets needed to load a webpage. CDNs are used to cache website content on their edge servers located at different Points of Presence, physically closer to the user, and strategically located in terms of network infrastructure. CDNs improve perceived system performance, overall availability, and security and help protect against DDoS attacks.
This lecture is about databases, focusing on relational databases. The lecturer explains that relational databases store data in tables, with each row representing a single record, and all the records are related through a predefined set of columns. The relationship between all the records in a table is what gives this type of database the name relational database. Each record in the table is uniquely identified by what's called a primary key, which can be represented by either one column or a set of columns in the table. One of the biggest advantages of relational databases is that they allow for the elimination of data duplication, which saves storage space and directly translates to cost savings for businesses.
The lecture explains non-relational databases, also known as NoSQL databases, which solve the limitations of relational databases, including difficulties with flexible schemas and querying. Non-relational databases are optimized for faster queries and support more natural data structures for programming languages. However, they lose the ability to analyze records when the schema is flexible. The lecture explores the three main types of non-relational databases, including key/value stores, document stores, and graph databases, with examples of their use cases.
The lecture discusses three techniques to improve the performance, availability, and scalability of databases in a large-scale system. Firstly, indexing, which speeds up retrieval operations through mapping columns to records. Secondly, database replication which replicates mission-critical data to different computers, providing high availability and better performance. Finally, sharding which is a technique to split a large database into smaller and more manageable parts, enhancing scalability and distribution.
The lecture explains the CAP Theorem in the context of distributed databases. CAP stands for Consistency, Availability, and Partition Tolerance. The theorem states that, in the presence of network partition, a distributed database cannot guarantee both consistency and availability and must choose one of them. The lecture gives examples to explain the implications of this theorem in detail.
The lecture discusses the importance of unstructured data and provides examples of use cases where it is required. Unstructured data, such as images, videos, and documents, can be too large to be stored in a traditional database, so scalable solutions are required. The lecture explains two solutions, a distributed file system, and an object store, and discusses their benefits and limitations. A distributed file system provides a familiar file and folder structure and is useful for machine learning and big data analysis. In contrast, an object store is designed for storing unstructured data at an internet scale, and scalability is achieved by adding more storage devices.
The lecture provides an introduction to the Multi-Tier Architecture pattern and focuses on one of the most common variations of the pattern, which is the Three-Tier Architecture. Multi-Tier Architecture physically separates applications into multiple tiers with logical separation to limit responsibility scope on each tier. The Three-Tier Architecture consists of a Presentation Tier, an Application Tier, and a Data Tier, and it is used for web-based services. It is easy to maintain and scale; however, it has a drawback of a monolithic structure for the logic tier.
The lecture introduces the Microservices Architecture pattern, comparing it to the Monolithic three-tier Architecture pattern. The Microservices Architecture pattern organizes business logic as a collection of independently deployed services, each owned by a small team with a narrow scope of responsibility. The pattern offers advantages such as a smaller codebase, better performance, scalability, autonomy, and security. However, there are best practices and challenges to consider, such as ensuring each service is logically separated with a single responsibility and avoiding excessive coordination between teams.
This lecture explains the concept of an Event-Driven Architecture (EDA), its components, and how it is used to enable asynchronous communication among microservices. In contrast to the traditional direct message style of communication, an EDA relies on events, which are immutable statements of a fact or change. This makes services more decoupled, allowing for higher scalability and easier addition of new services to the system without making changes to the existing services. Additionally, EDA enables real-time analysis of data streams and easy detection of patterns, making it possible to detect and respond to fraudulent activities or other events in real time.
This lecture provides an introduction and motivation for big data processing. It describes the three main characteristics of big data, which are volume, variety, and velocity. The lecture also gives examples of fields that generate big data, such as internet searches, medical software systems, real-time security, and weather prediction systems. The insights gained from analyzing big data can provide a significant competitive advantage over competitors. The lecture concludes by introducing architectural styles that help in processing and analyzing big data.
The lecture discusses two strategies for processing big data using event-driven architecture. The first is batch processing, where data is stored in a distributed database or file system and processed in batches on a fixed schedule. The second strategy is stream processing, which processes data as it arrives, making it suitable for real-time applications like fraud detection or monitoring systems. Use cases for batch processing include online learning platforms, search engines, and analyzing transportation device data. In contrast, stream processing can be used for real-time applications like social media analytics or stock trading.
The lecture discusses big data processing, specifically the trade-offs between real-time processing and batch processing strategies. The Lambda architecture is introduced, which leverages both methods to provide a balance between fault tolerance and comprehensive analysis. The Lambda architecture is divided into three layers: the batch layer, the speed layer, and the serving layer. The batch layer manages the data set and precomputes batch views, while the speed layer uses real-time processing. The resulting precomputed views are stored in a read-only database. Examples of use cases that require both real-time and batch processing are also presented.
In this lecture, the process of architecting a large-scale system from scratch is discussed. The process includes gathering functional and non-functional requirements, defining the API, creating an architecture diagram, and refining the diagram to address non-functional requirements. A system design problem is introduced, where a public discussion forum that can scale to millions of users worldwide is to be designed. The lecture explains how to capture the functional requirements and non-functional requirements of the system and the trade-offs that need to be made to balance availability, consistency, and performance. Finally, a REST API is defined for the system.
In this lecture, Michael Pogrebinsky discusses the software architecture for a web-based forum, explaining how to translate functional requirements into architecture diagrams. The lecture focuses on creating the necessary services for allowing users to sign up, create and view posts, comment on posts, and upvote or downvote them. Michael Pogrebinsky also explains how to structure databases to store user and post information, comments, and votes. They emphasize the need to design for scale and user privacy, including how to handle users' passwords and prevent multiple voting.
In this lecture, Michael Pogrebinsky discusses the design process of a highly scalable eCommerce marketplace platform where merchants can upload and sell their products, and users can browse, search and buy them. The design process includes clarifying functional requirements for both merchants and buyers, such as providing merchants with a product management system and analytics dashboard and allowing users to browse and search for products and checkout. The lecture also highlights the use of sequence diagrams to organize and visualize the system's actors and requirements.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.