LLM Mastery: Hands-on Code, Align and Master LLMs from Udemy

Dive into the most exhilarating and hands-on LLM course you'll ever experience. This isn't just learning—it's an adventure that will transform you from an AI enthusiast into a creator at the bleeding edge of technology. AI expert Javier Ideami, the creator of one of the most successful AI related courses on Udemy, brings you a totally unique new experience around LLM technology.

What Makes This Course Unmissable:

Code Your Own AI Universe: 80% hands-on coding with Python and Pytorch. Build an LLM from scratch, line by line. Watch AI come alive through your fingertips. And then go beyond and code a compact version of an alignment process, the magic that makes Chat
Origami Meets AI: Be part of a world-first. Unravel deep learning mysteries through the art of paper folding.
Deep Dive, Gradual Learning Curve: Only basic Python needed. We'll guide you through all the complex concepts around LLMs, from attention mechanisms to cross-entropy and beyond. By the end of the course, you will have gained advanced skills and knowledge about generative AI and LLMs.
Mind-Bending Finale: Cap it off with an optional guided meditation using the "generative AI" in your own brain. Mind = Blown.
Flexible requirements: Run everything on a humble 4GB GPU or anything more powerful. From Google Colab to your trusty laptop, flexibility is the mantra. On the cloud or local (Windows / Linux / Mac). All platforms and devices will work (with a minimum of 4GB of GPU memory)

Course Highlights:

Intro to Generative AI: Dive into the mesmerizing world of Generative AI, where machines create and innovate beyond imagination.
Code an LLM from Scratch: Code and nurture your very own LLM from scratch.
Unlocking an LLM Titan: Dissect an advanced LLM architecture. Peek behind the curtain of the most powerful AI systems on the planet.
Alignment. Code the Secret Sauce of the top LLMs: Code a cutting edge LLM alignment process. This is the crucial stage that makes Chat
Origami AI: Fold your way to neural network nirvana. Experience a world-first fusion of ancient art and cutting-edge science to grasp deep learning like never before.
AI Meets Zen: Cap your journey with an optional mind-bending guided meditation. Explore the ultimate generative AI - the one in your own brain - in a profound finale that bridges technology and spirituality.

All in One / Why You Can't Miss This:

The full package: You code both a small LLM as well as a cutting edge alignment technique. You also go deep into the understanding of a complex LLM architecture. In parallel you dive very deeply into all sorts of complex concepts around LLMs and deep learning, both during the coding and also during the unique Origami experience as well as the initial intro to Generative AI.
Uniqueness: Origami + AI = A learning experience you won't find anywhere else. Understand key insights about Deep Learning and Neural Networks through the magic of paper folding.
Practical Hands-on Mastery: 80% Practical. Learn and Build, Train and Align.
Future-Proof Skills: Position yourself at the forefront of the AI revolution.
And there's more: Added to all of that, the course connects you with free tools, articles and infographics produced by Ideami that enrich and accelerate your learning even more. Some of then, like the Loss Landscape explorer app, are unique tools in the world, created by Ideami for you.
Accessibility: Complex concepts explained so clearly, you'll feel like an AI whisperer.

In summary

This isn't just a course—it's a ticket to the AI creator's club. By the end, you'll have coded an LLM, understood its deepest secrets, coded an alignment technique, dived deep into profound insights about deep learning and gained practical skills that will make you the AI guru in any room.

Ready to code the future, fold profound insights through origami and blow your own mind? Join us on this unparalleled journey to the heart of LLM technology. Enroll now and prepare for the most fun, deep, and transformative tech adventure of your life.

Javier Ideami invites you to embark on a profound and captivating exploration into the heart of LLM technology and its limitless potential.

Ideami explains why this first section does an introduction about generative AI, in order to warm up and get used to some key base concepts before we begin coding

Javier briefly introduces himself and this first section that will provide a brief intro about Generative AI technology

We compare the fundamental differences between generative and discriminative models, explore the fascinating evolution of this technology, and uncover its transformative potential in an overview of its wide range of applications. Get ready to discover how Generative AI is reshaping industries and unleashing new frontiers of creativity.

Explore the essential machine learning and deep learning concepts that form the foundation of Generative AI architectures

We explore the key concepts behind the most successful Generative AI architectures, from GANs to Autoregressive models (used by LLMs), Diffusion models and beyond. We also review their main areas of application.

A journey through some of the most exciting applications of this technology, focusing on the creative industries, business and healthcare

Navigating the Ethical Landscape: Understanding the ethical Challenges of Generative AI

On the horizon: anticipating the next wave of Generative AI breakthroughs as this technology reshapes industries and redefines possibilities

A recap of some of the key areas we have explored in this introduction to Generative AI

Review Quiz

You will be able to code from scratch a small LLM, you will also train it and use it to generate new text. You will also understand in depth the key concepts involved in the architecture

Javier welcomes you to this exciting section where we are going to code line by line a small LLM. We will also train it and test it. And we will explore in depth all the key concepts related to the LLM architecture

Javier summarizes the options you have as to where to do the coding in this course

Javier goes deeper into the different platforms and systems that you can use to do the coding in this course

We review the options we will have when we make mistakes while coding, and also reinforce the previous information we shared about the different platforms we can use to host our coding environment

We explore the steps needed to setup our coding environment, both in the case of using a cloud platform like Google Colab, or in the case that we use our local Laptop or Desktop. If you don't have any experience using Jupyter Notebooks, make sure to also watch the following video on how Jupyter Notebooks work.

In this optional lecture, we review how Jupyter Notebooks work. We will use Jupyter Notebooks to create our code, using Python and Pytorch. If you don't have much knowledge about Jupyter Notebooks, this lecture will teach you the basics you need to proceed with the course.

We code the lines that allow us to import the necessary libraries that we need to build our LLM

We download some necessary files that we will need in the next parts of this section. In the video I show you how to download the files with some easy code. As a backup, the llm_train.zip file is also included in the downloadable materials of this lecture.

We discuss in detail the main parameters of the architecture, including the batch size, a value that will be key when we want to make sure that the training process fits with our memory capacity

We setup the very important hyperparameters of our network. These include the learning rate or regularization methods like dropout or weight decay, etc

We setup another set of parameters, ones that will be crucial to be able to run our training process in an effective way

Javier introduces the importance of logging our statistics during our training processes

We setup our logging, connecting to the weights and biases platform. And we verify the creation of the page where our training stats will appear.

Note: When you run the cell with the logging setup, you will see a link in the output: https://wandb.ai/authorize
Clicking on that link, after logging into your weights and biases account, will help you get your wandb API key quickly

We setup the all important tokenizer, which allows us to convert text into numerical IDs, and to decode numerical IDs back into text. We prepare the necessary functions and processes around the tokenizer.

We split our data into training and validation parts and create a get batch function that we will use to get batches of data during the training process

Javier introduces an optional reading and infographic that goes into the details of the Transformer Architecture that is at the core of the LLM we are going to code. While we code, Javier will be explaining the entire architecture in detail, so this is just optional reading. When we code LLMs, we actually use a simplified version of what the article describes, because we use de decoder-only version of the architecture.

We setup the main layers that will drive the workings of the LLM architecture

We code the outer level computations that will take place when we run the LLM. Later we will have to code other processes that take place within them.

We setup and code the calculation of the cross entropy loss, which will allow us to find the current performance of the network, the difference between our predicted distribution of the target tokens and our true distribution. Through the training process we will try to shorten that distance between both distributions, making that loss value as small as possible.

Instead of using the Pytorch Cross Entropy function, in this lecture we are going to recreate manually that method, understanding in depth all the reasons behind each of the calculations, as well as the related concepts that connect with information and entropy.

Note: after experimenting with this alternative manual loss calculation, consider commenting out the lines of this second version of the loss; otherwise they will slow down the training of your LLM (as you will be calculating the loss twice every time)

We pause for a moment our manual calculation of Cross Entropy in order to make a drawing on a canvas, where we go very deep into the concepts that give rise to the Cross Entropy Loss. We begin with the concept of Information. We then move to Entropy, and then to Cross-Entropy, understanding every detail of what drives this absolutely crucial calculation that massively influences the learning process of the LLM.

We continue our manual calculation of the Cross Entropy loss. Once completed, we test it and compare that we obtain the same result with this manual calculation than using the Pytorch Cross Entropy Function.

Note: If you don't want to slow down the training process, you can comment out the manual calculation of the loss after you experiment with it. So that the loss is calculated only once each time during the training process.

Javier introduces the topic of creating a function within the architecture that can generate new samples

We code the functionality that will allow us to pass an input sequence through the architecture with the request to continue the sequence with a number of tokens, generating new samples.

We test the sample generation functionality we built, even though we have not yet fully built the architecture, neither trained it. It's a great exercise to see what kind of generation gets produced when the network is incomplete and has not learnt anything yet.

We code the structure and high level computations of the blocks that compose the main structure of the transformer architecture within the LLM

A reminder of a key reason that explains why the transformer architecture is so capable. It provides a combination of communication and computation capabilities, through its attention mechanisms as well as other parts like the feed forward layers.

We code the feedforward functionality that provides computational power and complexity to the LLM

We code the class that distributes the data onto the different attention heads, providing communication capabilities that will allow the architecture to learn about the relationships between the tokens of the data sequences, the distance between them and the impact they have on each other.

Javier introduces the topic of attention, as the absolutely crucial part of the architecture that will allow it to learn about the context and relationships between the elements of our data sequences

We code the attention head, explaining in depth all the related concepts and computations, focusing on understanding every step and how each result is derived from the previous calculations.

In this lecture, to reinforce our understanding about attention, we are going to build a toy example that will allow us to debug in detail every step of the attention calculations, verifying with very specific steps how each of the results is obtained.

We review the code we have created so far and then go through a debugging process as an example of how to detect and solve a coding issue, something that may happen as we code the architecture

We create the functionality that will allow us to calculate a more precise value for the loss, for the performance of the network, both on the training data and also on the validation data. We will call this new function regularly from the training process in order to evaluate things like the potential for overfitting and the general performance.

We code and setup our optimizer, which will allow us, during the training process, to backpropagate our loss through the network and calculate the necessary gradients. With those gradients, the optimizer will help us tweak and update the parameters of our LLM in the right direction. We also setup the scheduler, which will control the evolution of the learning rate.

We create the functionality that will allow us to load and activate any of our previously saved checkpoints, so that we can either restart a training process from the point that we had saved, or to do inference using that loaded checkpoint.

We create the code that will allow us to do Inference and then load a pre-trained checkpoint in order to test the inference process.

Javier introduces the part in which we will begin coding the training loop, which will drive the learning process of the LLM

We code the main loop that will iterate through the data and drive the learning process of the LLM.

We train our LLM, checking and analyzing the statistics during the process. We also show how to stop the training, and restart it from a saved checkpoint.

Javier helps us keep the right perspective as we experiment, train and work with the small LLM we have created.

In this optional lecture, and on a separate notebook, Javier shows how to create the code that trains a tokenizer on our dataset, while understanding all the key parameters and related concepts. In this way we can produce ourselves the same tokenizer we used during the building of the LLM.

In this optional lecture, Javier demonstrates how to use the tokenizer we trained to encode our dataset and produce the same output file that we used during the building of our LLM.

Some conclusions about all the work we have done together to create our LLM, as well as some pointers about what comes next on our fascinating journey through the keys of LLM technology

The Downloadable materials of this final lesson of the section include:
- The small_llm_official.ipynb notebook

- The small_tokenizer_official.ipynb notebook

Use the official notebook when you need to compare the official code to the one you are creating, and also after completing the section, if you like. Just make sure you first code everything line by line following the videos with me, as that's the very best way of learning in a deep way.

You will code from scratch a small LLM. After building it, you will train it and use it to generate text. You will also understand in great depth the key concepts involved throughout the project

Javier welcomes you to this section, in which we are going to dive deep into the code of an advanced LLM architecture, comparing it with the code of the small LLM we built together, and understanding every key concept and detail.

We prepare a new coding environment, where we will host a new set of support files, and where later we will code together an alignment process over the advanced LLM that we will study in this section.

In the video I show you how to download those support files with some easy code. As a backup, the llm_align.zip file is also included in the downloadable materials of this lecture.

RLHF Graphic that appears in this lesson - Credit:
by PopoDameron
https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback#/media/File:RLHF_diagram.svg
Licence: https://creativecommons.org/licenses/by-sa/4.0/deed.en

We create a class that holds the key parameters of the architecture, and compare how the default values of those parameters of this advanced LLM compare with the ones we used in our small LLM and also with the ones we will use when we later train an alignment process over this advanced LLM.

We study the main class of the architecture and focus on the calculation of the loss, which as before will be done in two ways for educational reasons, first with Pytorch and then decomposing it manually into its intermediate operations.

We study the generation function of the architecture, which goes beyond our small LLM by working with extra parameters that control the degree of creativity and predictability of the generations.

We study the structure of the blocks of the LLM and the way the computations take place as we pass our data through each of these blocks.

We analyze the computational layers that increase the complexity of the LLM

We analyze the multi head attention mechanism implementation of this advanced LLM, which provides a very efficient way of running the necessary computations to calculate the attention scores and update the embeddings that reach the attention layer.

We continue and complete the analysis of the implementation of the multi head attention mechanism of this advanced LLM

We explore and analyze other parts of this advanced LLM, including a sophisticated way of encoding the position of the tokens in the sequences, RoPE, rotary positional embeddings, as well as other supporting functions of the architecture.

We study the code that will allow us to run inference on non-aligned and aligned pretrained checkpoints of this advanced LLM, in order to compare their results

We explore the ways of changing the parameters and running the inference code both online on the cloud and locally from command line

We run inference on both non-aligned and aligned pretrained checkpoints of this advanced LLM, comparing their outputs and results, to show why alignment is so important and crucial

We reflect on the inference results in relation to the scale and size of our model and datasets as well as the available resources involved throughout the process

You will code from scratch a complete alignment process over the advanced pre-trained LLM we studied earlier. You will train the alignment and understand in depth the key concepts involved

We reflect on the importance of alignment, explaining the most well known technique, RLHF. In this section we will apply a newer technique that is simpler and faster and we explain the importance of studying in depth, as we will do in this section, one of these existing techniques, in order to understand the keys of an alignment process and be ready to experiment with and apply any of the possible techniques in the future.

We explore the two datasets involved in our alignment project. The larger dataset that has been used to pretrain the base non-aligned model, and the smaller alignment dataset that we will use when we code line by line in this section the alignment process over the pretrained base model.

We begin coding our alignment process by importing the necessary libraries

We setup the parameters required by the alignment process that we will be coding and performing.

We explain the structure of the chat template that will be used in the process of tokenizing our alignment dataset

We code a function that filters our alignment dataset to follow key requirements necessary for the training process to work correctly.

We code the function that tokenizes the alignment dataset. This function also creates a data structure that contains several other essential data structures, which will be very useful when calculating the loss during the training of the alignment process.

We debug the pre-processing function in order to understand in deeper ways the data structures that the function is creating. And then we complete the last parts of it.

We split the alignment dataset into training and validation parts, and setup our training and validation dataloaders.

We instantiate our advanced LLM, load the pretrained checkpoint, assigning its saved parameters to the model, and then setup the optimizer that will drive key parts of our alignment training process.

We setup our scheduler and the associated function that will drive the evolution of the learning rate during the alignment training process.

We code the main structure of the training loop that will drive the alignment training process, iterating over our alignment data and tweaking the weights of the network to encourage the LLM to generate responses that are more aligned with the preferences stated in the alignment dataset.

In the first part of the coding of the alignment loss, we focus on the part of the loss that calculates the cross entropy between the predicted and the true distributions of the target tokens

We pause for a moment our coding of the alignment loss in order to make a drawing on a canvas, where we go very deep into the concepts that give rise to the other part of the alignment loss equation. The part that will help us favor responses that are aligned with our preferences, and penalize those that are not aligned with them. We focus on understanding every detail of what drives this absolutely crucial calculation that will massively influence the alignment process of our advanced LLM.

We first look together at an academic paper to explore how we can also find the concepts behind the calculations within the original research published in academic platforms. And then we continue coding the alignment loss, adding the other part of the equation that will help us favor responses that are aligned with our preferences, and penalize those that are not aligned with them.

In the final part of our coding of the alignment loss, we create a crucial intermediate function that calculates some key per token log probabilities that are needed in the main loss equation. To understand the inner workings of this function, we setup a toy example that allows us to explore in a very detailed way every part of the computations that produce those important calculations.

We code the lines that add logging to the training loop as well as the ones that save intermediate checkpoints, and then start the alignment training process.

We track the alignment training process and monitor the statistics. After some time, we test the results and display the monitoring data of other training processes using expanded charts, providing insights into additional key aspects of the alignment process.

We expand our code with a new function that will calculate the training and validation loss with more precision. This will slow down our training process but will help us reflect on interesting insights in the next video.

We go into a deep dive analyzing training and validation charts of different alignment processes, reflecting on insights that can help us drive some of the key parameters of the training process, such as the learning rate and others.

We wrap-up the alignment project with a summary that emphasizes the crucial important of these alignment processes in order to drive the behavior of these models towards our preferred forms of interaction.

After reminding us briefly of the path that took us to this point, Javier reflects on the beauty of the calculations that help us favor preferred responses and penalize non-preferred ones, as the network gradually aligns itself with the preferences represented by our alignment dataset.

Congratulations for making it all the way till here! We summarize the key highlights of everything that we have accomplished and also remind you that towards the end of this course you have a very fun and insightful section in which you can learn and/or review some important deep learning and AI related concepts and insights through the use of Origami. Also, at the very end of the course you can relax in the final section, while still exploring, through the guided meditation that will use the generative model in your head.

The Downloadable materials of this final lesson of the section include:
- The align_llm_official.ipynb notebook

Dive into a world of shapes and colors as we use origami to explore key concepts and insights that drive the neural networks that power LLMs and Generative AI in general

Javier welcomes you to this fun journey where we will use Origami to explain key insights about neural networks and AI

In this fun and insightful section, we will be combining tangible physical elements like papers, lines, colors, etc with advanced digital representations in order to understand the very essence of how the neural networks that power Generative AI learn their internal mappings that connect their inputs with their objectives

We start at the base of the challenge, by exploring the dimensionality of the inputs and outputs that define the framework for the mapping the neural network is tackling

LLM Mastery

Hands-on Code, Align and Master LLMs

What's inside

Learning objectives

Syllabus

Good to know

Save this course

Activities

Career center

Reading list

Share

Similar courses