April 2024 Update: Two new sections have been added recently. New Section 5: learn to edit the clothes of a person in a picture by programming a combination of a segmentation model with the Stable Diffusion generative model. New bonus section 6: Journey to the latent space of a neural network - dive deep into the latent space of the neural networks that power Generative AI in order to understand in depth how they learn their mappings.
April 2024 Update: Two new sections have been added recently. New Section 5: learn to edit the clothes of a person in a picture by programming a combination of a segmentation model with the Stable Diffusion generative model. New bonus section 6: Journey to the latent space of a neural network - dive deep into the latent space of the neural networks that power Generative AI in order to understand in depth how they learn their mappings.
Generative A.I. is the present and future of A.I. and deep learning, and it will touch every part of our lives. It is the part of A.I that is closer to our unique human capability of creating, imagining and inventing. By doing this course, you gain advanced knowledge and practical experience in the most promising part of A.I., deep learning, data science and advanced technology.
The course takes you on a fascinating journey in which you learn gradually, step by step, as we code together a range of generative architectures, from basic to advanced, until we reach multimodal A.I, where text and images are connected in incredible ways to produce amazing results.
At the beginning of each section, I explain the key concepts in great depth and then we code together, you and me, line by line, understanding everything, conquering together the challenge of building the most promising A.I architectures of today and tomorrow. After you complete the course, you will have a deep understanding of both the key concepts and the fine details of the coding process.
What a time to be alive. We are able to code and understand architectures that bring us home, home to our own human nature, capable of creating and imagining. Together, we will make it happen. Let's do it.
We explore the general roadmap of the course, as we prepare to embark on this fascinating mission to the core of the most promising A.I architectures of today.
Javier welcomes you from his spacecraft, outlining the upcoming challenges, starting with generative adversarial networks and later on, with multimodal A.I. Let's do it!
Welcome to the generative revolution. In this video, we begin to explore how we got to where we are today, to the spark that triggered this generative revolution that brings us closer to home, to our home nature, as entities capable of generating and creating new things.
We explore how generative A.I complements previous deep learning architectures and why these architectures are key to the future of A.I and the search for AGI (artificial general intelligence)
We explore the potential of generative A.I and some of its possible areas of application
We explore the what and the how of these generative architectures. From latent spaces to representation learning, we begin to go deep into how these architectures work and what they do.
We go deeper into the latent spaces of these generative architectures, explaining a couple of examples of how we can navigate them to change the features of the generated results, or interpolate between points in the latent space to produce morphings and other effects.
We explore the key concepts of how Generative Adversarial Networks (GANS) work. GANs are a type of advanced generative architecture that will be the topic of our first two coding phases. You may also read a fun article about GANS that I wrote in medium: https://towardsdatascience.com/leonardo-and-the-gan-dream-f69e8553e0af?sk=c1fdf85e94c48acd61df451babc41dfe
We explore some of the many benefits that generative A.I brings. And then we begin to explore the potential of combining these generative architectures with other areas, like evolutionary strategies, reinforcement learning and beyond.
We continue exploring the combination of generative architectures with reinforcement learning and other fields, such as medicine, until we converge to our "coming home" mission statement. We are taking A.I towards our own human nature, capable of generating, imagining and creating. What could be more exciting?
As a conclusion to this exploration of the generative revolution, Javier improvises a song dedicated to generative A.I and its potential to bring A.I closer to home, closer to our generative, imaginative and creative human nature
Connecting from his spacecraft, Javier introduces you to the first coding section where we will build together a basic generative architecture to understand the key principles involved in the process.
We explore in depth the fascinating battle that takes place between the generator and discriminator of a typical generative adversarial network. With custom made slides, Javier explains every detail of their interaction as we move deeper into the essence of these incredible networks.
We go very deep into cross entropy, a key concept involved in the calculation of the loss value of the GAN we will code. The loss value will help us measure the performance of the network as we train it. Understanding cross entropy will be very useful in general as it is a concept that appears in many areas of machine learning and deep learning.
We go deep into the calculations needed to obtain the loss value of the discriminator. We make sure to understand in depth every part of the equations involved. This understanding will help you as well to grasp quickly related concepts from other deep learning architectures.
We explore the calculations needed to obtain the loss value of the generator. We go really deep so that you understand every part of the equations involved. This understanding will help you as well to grasp quickly related concepts from other deep learning architectures.
In this optional lesson, we learn the basics of the free online Google Colab environment, a free environment that you can use to code during the course. In Google colab, you can create jupyter notebooks, a combination of code and text cells where you can add python and pytorch code to generate the examples of this course in a very comfortable way.
We begin coding! First, we import the libraries we need and declare a function that we will use to visualize our results as the training evolves.
We move on, adding the key parameters of our network and creating our dataloader structure
We code together the generator class, going deep into the meaning of the different layers of the network that will be in charge of transforming the initial noise into new images.
We code together the discriminator class, going deep into the meaning of the different layers of the network that will be in charge of deciding if its inputs are real or fake
We put together the optimizer data structure that will be in charge of calculating the gradients and backpropagating them through the networks, as well as of updating their parameters to move them in the direction that will lower their loss value and improve their performance.
We create together the functions that measure the performance of the networks, by producing their loss values.
Time to create the main training loop! We focus first on the discriminator part of the loop, the code that trains the discriminator network so that it improves its capacity to predict if its inputs are real or fake.
We continue working on our training loop, focusing now on the generator part, creating the code that will train and improve the generator so that it can produce results that approach more and more the original training dataset. We also add code to show the key stats of the training process.
Time to train the system! We run the training loop and check the first results produced by the generator network
We conclude the section talking about the results and the different challenges faced by this basic GAN that has helped us learn in depth the key principles of these fascinating generative architectures.
Connecting from his spacecraft, Javier introduces you to the next coding section where we will build together an advanced generative architecture capable of generating human faces. We will go really deep into the key principles involved in the process, and into every line of the code we will produce. We will also introduce many new useful things, like the code that will allow us to use a free external service to track the statistics of our training process in real time from wherever we are, or the capacity to save and load checkpoints to restar our training whenever we want. Let's do it!
We explore the challenges faced a basic GAN architecture, as we prepare to code the more advanced generative architecture
We go deep into the calculation of the loss value that takes place in a Wasserstein GAN, the type of advanced generative architecture that we are creating in this section. This type of advanced network uses a different principle to calculate its loss value and we explore it in depth in this video.
We explore the best way to calculate the gradient penalty, an extra term needed by this type of network to enforce a constraint on the size of the critic gradients.
Time to code! We begin by importing the necessary libraries and creating our visualization function that will allow us to check the results of the generator during the training process.
We add the code to connect to the free Weigths and Biases service that will allow us to track the statistics of our training process remotely and in real time. This is an optional but very recommended part of this section.
We begin to create together the generator class, which will include a convolutional network that will output a brand new image
Convolutional layers (transposed and standard) are a key part of the generator and critic networks. In this video, we go really deep into convolutions, to understand in depth how they work
We code the generator class and network, which will will produce a brand new image. We will soon train it so that it gradually improves its capacity to fool the critic network
Time to code the critic class and network, which will try to detect if its inputs are real or fake.
In this optional video, we explore an alternative way to initialize the parameters of our networks, in case we want to do experiments with that part of the process at any time.
Time to load the dataset that we will using to train the networks. The CelebA dataset provides more than 200000 human faces of celebrities. In the code and video, I provide a google drive link from where you can download the data. I provide alternative links in the additional information attached to this video as well.
We code together the dataloader and optimizer structures that will allow us to produce batches of our data and to calculate the gradients during the training process. The optimizer will also be in charge of tweaking the parameters of the network after backpropagating the gradients in order to change them in the direction that will lower the loss and raise the performance of the system
We create the function that will calculate the gradient penalty term, which will help us fulfill the constraint needed by this network so that the values of the critic's gradients remain controlled.
We add the functions that will allow us to save and load checkpoints, so that we can restart long training processes whenever we like
Time to code the training loop! We begin with the code that will be training the critic, which will learn to predict if its inputs are fake or real
We continue creating the training loop, this time focusing on the part that trains the generator, with the objective of producing results that fool the critic
It's time to add the part that calculates and displays the stats of the training process. We will also polish different parts of the code.
Before we run the training, we review the different parts of the code to ensure that all is looking good
It's time to train! We do the last checks and begin the training process
As the training progresses, we check the first results produced by the generator network. We also look at the real time stats produced by our remote service, which we can access from anywhere, anytime.
We continue analyzing the results of the generator as the training progresses. We also analyze the detailed statistics and the convergence pattern of the loss values of generator and critic.
As the training progresses and the results keep improving, we continue checking the stats and other parts of the process
Once the system has been trained for a while, we can navigate its latent space, creating interpolations that can allow us to produce morphing and other effects. In this video, we create the code that allows us to create a morphing effect between the images generated from two different points of the latent space.
We create another morphing interpolation to conclude the section and the process of coding this advanced generative architecture
Connecting from his spacecraft, Javier introduces you to the next coding section where we will go deep into multimodal A.I, combining two cutting-edge architectures, one that connects visual and text elements, and the other one, a generative network capable of producing high resolution results. By linking them, we will be able to transform text prompts into amazing brand new images.
We explore in depth how this multimodal A.I system is going to work. We explore the details of the way we will combine both advanced architectures, and the inputs and outputs of each of the key parts of the process. A fascinating adventure, exploring the incredible potential of multimodal generative A.I.
Time to code! We begin this fascinating adventure by downloading and importing the necessary repos and libraries
Time to create some helper functions and then set the key hyperparameters and parameters of the process
We setup and instantiate the CLIP model, the architecture that has been trained to connect texts with images.
Time to setup the advanced generative architecture that will be in charge of transforming the parameters we are optimizing into brand new images.
We create a class that will hold our latent space, the parameters we will optimize.
Time to create the functions that will use the CLIP model to produce encodings of our text prompts
We code together an important function that processes the generated images with augmentations and creates a series of crops that will be sent to the CLIP model to be encoded.
We code a function that will allow us to display the output of the network that generates our images, as well as the intermediate crops
In a couple of key functions, we code the optimization process, which combines the CLIP encodings of texts and images to produce the loss value that will drive the tweaking of the parameters we are optimizing, in order to drive the system towards generating images that match in better ways with the text prompts.
Onto the training loop! We produce the code that will iterate through an optimization process that will gradually drive the generated images closer to the concept expressed by the text prompts.
We run the training process and analyze the results as the images being generated move towards the concepts expressed by the text prompts.
Once we have stored a series of points in the latent space, we can now code a function that will interpolate between those points and generate a sequence of images that transitions between the results generated at each of the in-between positions in that latent space.
We produce the code that creates a video from the sequence of images generated by our interpolating function. And then add the code to display the video within the notebook
We explore the importance of experimenting with the code, creating variations, trying new combinations of parameters and more
We modify part of our code to push the optimization process to create a new type of texture, a kind of pictorial sfumato effect, like the one that Leonardo Da Vinci used to create. This change in the code will have a dramatic impact on certain kinds of text prompts
We reflect on the results of producing the sfumato effect, and the contrast with the results obtained without it.
Congratulations, you made it! I'm so proud of you. In this video, Javier congratulates you from space. You reached the stars of generative A.I, and now, the sky is the limit!
Overview of how we will combine a segmentation model with the Stable Diffusion generative model in order to perform InPainting, selective editing of the clothes of a person in a picture
We install the necessary libraries and setup the segmentation model that will allow us to mask the elements that we want to edit.
We setup the Stable Diffusion generative model. This is the model that will allow us to do InPainting, selective editing of the parts that we will mask with the segmentation model.
We load the picture that we want to edit, and we proceed to adapt it to the requirements of the deep learning models we will use. We then run the segmentation model to produce a number of masks from the source image.
We visualize the generated masks on top of the source image, and we pick the mask related to the area that we want to edit.
We run the Stable Diffusion generative model, giving it our source image, the selected mask, and a number of text prompts. In this way, we generate a number of results which edit the masked area, pushing it in the direction of the prompts. We experiment with variations of the parameters of the model.
We use an alternative architecture to run again the segmentation process. With this alternative architecture, we are able to guide the segmentation process by using text prompts.
We run the Stable Diffusion generative model applied to the masks generated in this new setup.
Final comments to conclude this section
In this fun and insightful section, we will be combining tangible physical elements like papers, lines, colors, etc with advanced digital representations in order to understand the very essence of how the neural networks that power Generative AI learn their internal mappings that connect their inputs with their objectives
We start at the base of the challenge, by exploring the dimensionality of the inputs and outputs that define the framework for the mapping the neural network is tackling
From simple lines to complex creations: unveiling the power and limits of linearity in neural networks. In this lecture we explore linear transformations, the powerhouse of neural networks"
Beyond the straight line: we explore how non linear activation functions allow neural networks to introduce more complexity into the input-output mappings they learn
The bias-variance tradeoff, finding the sweet spot between underfitting and overfitting, as the neural network learns the mapping that produces a great fit between its inputs and outputs
We increase the dimensionality of the input and visualize and reflect on how the non linear mappings behave in the latent spaces of the neural network
We explore how to increase the expressive power of neural networks by visualizing the impact of depth on the complexity of the mappings created at the latent spaces of these architectures
We arrive to very complex mappings, from high dimensional manifolds to other complex mathematical surfaces and objects, and to the next phase of AI, made of agents that update in real time their dynamic and ever changing latent spaces
Through advanced digital representations and simulations, we reflect on the way the complexity of the latent spaces of neural networks changes and evolves as we train these networks and as they get deployed in a near future within dynamic agents that will be constantly updating their world models in response to their environment.
Navigating Loss Landscapes: we explore how to create visualizations that connect the weights of the neural network with its performance, through the creation of 3D landscapes that relate weight combinations with the loss values at the end of the network
Exploring a visualization of the loss landscape of the generator of a Generative Adversarial Network that is being trained to learn to generate images of human faces. The loss value (performance) at the center of the representation corresponds to the current weight values of our network. The surrounding landscape (around the center) represents other combinations of weight values in the vicinity of our current ones.
Exploring a real time visualization of how the weights of a neural network change as its training process progresses.
A quick summary as we complete our exciting journey to the depths of the latent space of a neural network
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.