We may earn an affiliate commission when you visit our partners.
Course image
Çağatay Demirbaş

In this course, you will step into the world of Large Language Models (LLMs) and learn both fundamental and advanced end-to-end optimization methods. You’ll begin with the SFT (Supervised Fine-Tuning) approach, where you’ll discover how to properly prepare your data and create customized datasets using tokenizers and data collators through practical examples. During the SFT process, you’ll learn the key techniques for making large models lighter and more efficient with LoRA (Low-Rank Adaptation) and quantization, and explore step by step how to integrate them into your projects.

Read more

In this course, you will step into the world of Large Language Models (LLMs) and learn both fundamental and advanced end-to-end optimization methods. You’ll begin with the SFT (Supervised Fine-Tuning) approach, where you’ll discover how to properly prepare your data and create customized datasets using tokenizers and data collators through practical examples. During the SFT process, you’ll learn the key techniques for making large models lighter and more efficient with LoRA (Low-Rank Adaptation) and quantization, and explore step by step how to integrate them into your projects.

After solidifying the basics of SFT, we will move on to DPO (Direct Preference Optimization). DPO allows you to obtain user-focused results by directly reflecting user feedback in the model. You’ll learn how to format your data for this method, how to design a reward mechanism, and how to share models trained on popular platforms such as Hugging Face. Additionally, you’ll gain a deeper understanding of how data collators work in DPO processes, learning practical techniques for preparing and transforming datasets in various scenarios.

The most significant phase of the course is GRPO (Group Relative Policy Optimization), which has been gaining popularity for producing strong results. With GRPO, you will learn methods to optimize model behavior not only at the individual level but also within communities or across different user groups. This makes it more systematic and effective for large language models to serve diverse audiences or purposes. In this course, you’ll learn the fundamental principles of GRPO, and then solidify your knowledge by applying this technique with real-world datasets.

Throughout the training, we will cover key topics—LoRA, quantization, SFT, DPO, and especially GRPO—together, supporting each topic with project-oriented applications. By the end of this course, you will be fully equipped to manage every stage with confidence, from end-to-end data preparation to fine-tuning and group-based policy optimization. Developing modern and competitive LLM solutions that focus on both performance and user satisfaction in your own projects will become much easier.

Enroll now

What's inside

Learning objectives

  • You will grasp the core principles of large language models (llms) and the overall structure behind their training processes.
  • You will learn the differences between base models and instruct models, as well as the methods for preparing data for each.
  • You’ll learn data preprocessing techniques along with essential tips, how to identify special tokens required by models, understanding data formats, and methods
  • You’ll gain practical, hands-on experience and detailed knowledge of how lora and data collator work.
  • You’ll gain a detailed understanding of crucial hyperparameters used in training, including their purpose and how they function.
  • You’ll practically learn, in detail, how trained lora matrices are merged with the base model, as well as key considerations and best practices to follow during
  • You’ll learn what direct preference optimization (dpo) is, how it works, the expected data format, and the specific scenarios in which it’s used.
  • You’ll learn key considerations when preparing data for dpo, as well as understanding how the dpo data collator functions.
  • You’ll learn about the specific hyperparameters used in dpo training, their roles, and how they function.
  • You’ll learn how to upload your trained model to platforms like hugging face and manage hyperparameters effectively after training.
  • You’ll learn in detail how group relative policy optimization (grpo), a reinforcement learning method, works, including an in-depth understanding of its learnin
  • You’ll learn how to prepare data specifically for group relative policy optimization (grpo).
  • You’ll learn how to create reward functions—the most critical aspect of group relative policy optimization (grpo)—through various practical reward function exam
  • In what format should data be provided to grpo reward functions, and how can we process this data within the functions? you’ll learn these details thoroughly.
  • You’ll learn how to define rewards within functions and establish clear reward templates for grpo.
  • You’ll practically learn numerous details, such as extracting reward-worthy parts from raw responses and defining rewards based on these extracted segments.
  • You’ll learn how to transform an instruct model into one capable of generating “chain of thought” reasoning through grpo (group relative policy optimization).
  • Show more
  • Show less

Syllabus

Introduction
Create a Colab Notebook and Get Familiar with the Libraries
Course Content Introduction
Jupyter Notebooks
Read more

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Activities

Coming soon We're preparing activities for LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO. These are activities you can do either before, during, or after a course.

Career center

Learners who complete LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO will develop knowledge and skills that may be useful to these careers:
Generative Artificial Intelligence Developer
A Generative Artificial Intelligence Developer builds and refines models capable of creating new content, such as text, code, or images. This course is perfectly aligned for a Generative Artificial Intelligence Developer, offering end-to-end expertise in Large Language Model fine-tuning and optimization. You will master Supervised Fine-Tuning SFT, Direct Preference Optimization DPO for user-focused results, and Group Relative Policy Optimization GRPO for diverse outputs. The practical curriculum covers data preparation, LoRA and quantization for efficiency, and deploying models to Hugging Face, enabling you to confidently develop modern, competitive LLM solutions that excel in performance and user satisfaction, including complex "Chain of Thought" reasoning.
Natural Language Processing Engineer
A Natural Language Processing Engineer focuses on building systems that understand, process, and generate human language. This course is exceptionally tailored for an NLP Engineer, providing an end-to-end mastery of Large Language Model optimization. You will learn to prepare data, use tokenizers, and apply advanced fine-tuning methods such as Supervised Fine-Tuning SFT, Direct Preference Optimization DPO for user feedback, and Group Relative Policy Optimization GRPO for diverse audiences. Practical skills in LoRA and quantization for model efficiency, along with model deployment to Hugging Face, equip you to develop modern, competitive, and user-centric NLP solutions with confidence.
Deep Learning Engineer
A Deep Learning Engineer specializes in designing, training, and deploying neural network architectures, of which Large Language Models are a prominent example. This course offers comprehensive expertise for a Deep Learning Engineer, delving into the intricacies of LLM optimization. You will learn practical techniques like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. Mastering LoRA and quantization for model efficiency, alongside detailed knowledge of data collators, hyperparameters, and model deployment to Hugging Face, ensures you are fully equipped to develop and fine-tune state-of-the-art deep learning solutions that prioritize both performance and user satisfaction.
Machine Learning Engineer
A Machine Learning Engineer designs, builds, and deploys algorithms and models that power various intelligent applications. This course directly prepares one to excel as a Machine Learning Engineer by providing comprehensive skills in developing, optimizing, and deploying Large Language Models. You will master techniques like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO for model behavior. Practical experience with LoRA and quantization for efficiency, data preparation, tokenizers, and uploading models to platforms like Hugging Face ensures you can manage every stage of LLM development. This deep, practical knowledge is invaluable for creating competitive and high-performing AI solutions.
Machine Learning Operations Engineer
A Machine Learning Operations Engineer focuses on the deployment, monitoring, and maintenance of machine learning models in production environments. For a Machine Learning Operations Engineer, this course is highly relevant, providing crucial insights into LLM production readiness. You will gain practical knowledge of making large models lighter and more efficient with LoRA and quantization techniques, which are vital for scalable deployment. Learning about data preparation, managing hyperparameters, and the process of uploading trained models to platforms like Hugging Face directly translates into skills needed to streamline the operational lifecycle of Large Language Models, ensuring reliable and performant AI systems.
Artificial Intelligence Engineer
An Artificial Intelligence Engineer develops and implements AI-driven solutions across various domains. This course highly benefits an Artificial Intelligence Engineer by providing a deep, practical understanding of Large Language Models, a cornerstone of modern AI. You will learn fundamental and advanced optimization methods, including Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. The practical focus on data preparation, efficient model tuning with LoRA and quantization, managing hyperparameters, and deploying models to platforms like Hugging Face empowers you to create robust, high-performance, and user-satisfied AI systems that serve diverse purposes.
Data Scientist Machine Learning Focus
A Data Scientist with a Machine Learning Focus applies advanced analytical and machine learning techniques to extract insights and build predictive models. This course is highly beneficial for a Data Scientist Machine Learning Focus, equipping you with specialized skills in Large Language Model optimization. You will learn to prepare and transform datasets using tokenizers and data collators, apply Supervised Fine-Tuning SFT, and leverage advanced methods like Direct Preference Optimization DPO and Group Relative Policy Optimization GRPO for user-focused outcomes. This practical, end-to-end experience in fine-tuning and deploying LLMs is crucial for sophisticated data-driven projects that leverage generative AI.
Artificial Intelligence Research Scientist
An Artificial Intelligence Research Scientist explores new AI techniques and advances the state of the art in machine learning. This course provides an excellent foundation for an Artificial Intelligence Research Scientist, especially in the domain of Large Language Models. You will gain an in-depth understanding of cutting-edge optimization methods like Direct Preference Optimization DPO and Group Relative Policy Optimization GRPO, including their fundamental principles and application to "Chain of Thought" reasoning. The hands-on experience with reward mechanisms and fine-tuning offers practical knowledge essential for conducting innovative research and developing novel LLM architectures and training paradigms. This role typically requires an advanced degree.
Artificial Intelligence Technical Lead
An Artificial Intelligence Technical Lead guides development teams in building robust AI solutions and oversees technical architectural decisions. This course greatly benefits an Artificial Intelligence Technical Lead by providing comprehensive, hands-on mastery of Large Language Model optimization. You will delve into advanced techniques like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. The practical curriculum, including LoRA, quantization for efficient models, data preparation, and deployment to Hugging Face, equips you to mentor teams, make informed technical choices, and lead the development of modern, high-performance, and user-centric LLM systems. This role often requires significant experience.
Machine Learning Consultant
A Machine Learning Consultant advises businesses on implementing and optimizing machine learning solutions to solve complex problems. This course provides comprehensive, end-to-end expertise for a Machine Learning Consultant, particularly in the realm of Large Language Models. You will master advanced fine-tuning techniques like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. Practical knowledge of LoRA and quantization for efficiency, data preparation, and deploying models to Hugging Face, enables you to offer strategic guidance and practical solutions for clients seeking to leverage cutting-edge LLM technology for performance and user satisfaction.
Solutions Architect Artificial Intelligence
A Solutions Architect Artificial Intelligence designs end-to-end AI solutions, translating business requirements into technical architectures. This course is highly beneficial for a Solutions Architect Artificial Intelligence, providing a deep understanding of Large Language Model capabilities and their optimization. You will learn about efficient model techniques like LoRA and quantization, advanced fine-tuning methods such as Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. This comprehensive knowledge, including deployment to platforms like Hugging Face, empowers you to design scalable, high-performance, and user-centric LLM-powered systems that meet diverse business needs and maximize user satisfaction. This role often requires significant experience.
Data Engineer Machine Learning Focus
A Data Engineer with a Machine Learning Focus builds and maintains robust data pipelines crucial for machine learning model development and deployment. This course significantly helps a Data Engineer Machine Learning Focus by providing detailed insights into preparing and transforming data specifically for Large Language Models. You will gain practical experience with essential techniques like creating customized datasets, using tokenizers and data collators, and understanding specific data formats required for Supervised Fine-Tuning SFT, Direct Preference Optimization DPO, and Group Relative Policy Optimization GRPO. This knowledge is vital for designing efficient and scalable data infrastructure that supports end-to-end LLM fine-tuning and operationalization.
Artificial Intelligence Product Manager
An Artificial Intelligence Product Manager defines the vision, strategy, and roadmap for AI-powered products. This course is highly valuable for an Artificial Intelligence Product Manager, offering deep technical understanding of Large Language Models. Knowing how models are optimized with methods like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO for user feedback, and Group Relative Policy Optimization GRPO for diverse user groups, directly informs product design and feature prioritization. Understanding LoRA and quantization for performance and efficiency allows for realistic scoping, ensuring successful LLM product development focused on both technical feasibility and user satisfaction.
Prompt Engineer
A Prompt Engineer specializes in crafting, testing, and refining inputs to guide Large Language Models to generate desired outputs effectively. This course helps a Prompt Engineer by offering a foundational understanding of how LLMs are optimized and behave. While focusing on model tuning, learning techniques like Supervised Fine-Tuning SFT, Direct Preference Optimization DPO for user preferences, and Group Relative Policy Optimization GRPO for diverse responses, provides deep insight into model internals. This knowledge helps one to better anticipate model responses, understand its limitations, and craft more sophisticated and effective prompts, including those for "Chain of Thought" reasoning.
Computational Linguist
A Computational Linguist applies computational methods to analyze and process human language, frequently developing tools for tasks like machine translation or natural language generation. This course may be useful, providing a deep understanding of Large Language Models, central to modern language AI. Learning data preparation, tokenizers, and fine-tuning methods like SFT, DPO, and GRPO offers practical insight into model construction. The focus on custom datasets, defining reward functions, and achieving "Chain of Thought" reasoning directly enhances one's ability to develop and evaluate linguistically sophisticated AI applications. This detailed knowledge of LLM internals is highly relevant for advancing computational linguistics. This role typically requires an advanced degree.

Reading list

We haven't picked any books for this reading list yet.
Explores the potential impact of LLMs on the future of AI and society. It discusses the ethical implications of LLMs and the challenges that need to be addressed.
Provides a detailed overview of language models, including LLMs. It focuses on the theoretical foundations of language models and their applications in NLP.
Provides a comprehensive overview of deep learning, including LLMs. It valuable resource for anyone who wants to learn more about the theoretical foundations of LLMs.
This classic textbook covers a wide range of topics in speech and language processing, including LLMs. It provides a comprehensive overview of the field and valuable resource for anyone who wants to learn more about LLMs.
Introduces the main concepts of deep reinforcement learning in a more accessible way, using examples to explain the underlying mathematics. It is suitable for those new to the field or who prefer a less formal introduction before diving into more theoretical texts. It helps solidify understanding through intuitive explanations.
Provides an introduction to reinforcement learning for finance, covering the different algorithms and applications. It is suitable for readers who have a basic understanding of reinforcement learning and finance.
Provides an introduction to adaptive dynamic programming, which subfield of reinforcement learning that uses function approximation to approximate value functions and policies. It is suitable for readers who have a basic understanding of reinforcement learning.
This forthcoming book focuses on the contemporary and increasingly important topic of Reinforcement Learning from Human Feedback (RLHF). It provides an introduction to the core methods and discusses advanced topics and open questions in this rapidly developing area. It's highly relevant for those interested in the latest advancements in RL.
Offers a practical approach to deep reinforcement learning, focusing on applying modern RL methods to real-world problems. It is valuable for those looking to deepen their understanding by implementing algorithms and working through practical examples. The book covers a wide range of topics and is particularly useful for practitioners.
Provides a concise and rigorous introduction to the algorithms of reinforcement learning. It is an excellent resource for those who want to deepen their understanding of the theoretical underpinnings of RL algorithms. While shorter than Sutton and Barto, it offers valuable insights and good supplementary read for a more mathematical perspective.
While not solely focused on reinforcement learning, this book foundational text in deep learning, which critical component of modern reinforcement learning (deep reinforcement learning). A strong understanding of deep learning prerequisite for many advanced RL topics, making this an essential reference.
Uniquely combines the theoretical foundations of deep reinforcement learning with practical implementation in Python. It's a valuable resource for students and practitioners who want to understand both the 'why' and the 'how' of DRL algorithms. It builds from the basics to more complex topics.
Geared towards professionals, this book focuses on applying reinforcement learning in industrial settings. It covers the practical aspects of deploying RL solutions and offers insights into real-world use cases. It's particularly relevant for those looking to apply RL in their work.
Provides a comprehensive introduction to the growing field of multi-agent reinforcement learning. It covers the foundations, including game theory and deep learning techniques, and discusses modern approaches. It is highly relevant for those interested in contemporary RL topics and is suitable for graduate students and researchers.
Offers a hands-on guide to implementing deep reinforcement learning projects. It covers fundamental concepts and algorithms and progresses to more advanced topics, with an emphasis on practical application. It's ideal for those who learn best by doing and want to quickly apply DRL to real-world scenarios.
Explores the application of reinforcement learning methods specifically within the domain of finance. It provides a Python-based introduction to relevant algorithms and their use in financial problems like algorithmic trading. It's a valuable resource for those with an interest in this specialized application area.
Is widely considered the foundational text in reinforcement learning, providing a comprehensive introduction to the field's key ideas and algorithms. It is suitable for gaining a broad understanding and is often used as a primary textbook in academic settings. The second edition includes updated coverage and new topics, making it relevant for both students and professionals seeking a solid theoretical grounding.
Provides a comprehensive guide to decision-making under uncertainty, with a significant focus on reinforcement learning and Markov decision processes. It offers a solid foundation in the theoretical concepts underlying RL and is valuable for those interested in the broader context of sequential decision making.
Provides an introduction to reinforcement learning for cybersecurity, covering the different algorithms and applications. It is suitable for readers who have a basic understanding of reinforcement learning and cybersecurity.
Sequel to the previous one and provides an introduction to deep reinforcement learning, which subfield of reinforcement learning that uses deep neural networks to approximate value functions and policies. It is suitable for readers who have a basic understanding of reinforcement learning.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser