We may earn an affiliate commission when you visit our partners.
Course image
Course image
Coursera logo

Efficiently Serving LLMs

Travis Addair

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users.

Read more

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users.

You’ll walk through the most important optimizations that allow LLM vendors to efficiently serve models to many customers, including strategies for working with multiple fine-tuned models at once. In this course, you will:

1. Learn how auto-regressive large language models generate text one token at a time.

2. Implement the foundational elements of a modern LLM inference stack in code, including KV caching, continuous batching, and model quantization, and benchmark their impacts on inference throughput and latency.

3. Explore the details of how LoRA adapters work, and learn how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously.

4. Get hands-on with Predibase’s LoRAX framework inference server to see these optimization techniques implemented in a real world LLM inference server.

Knowing more about how LLM servers operate under the hood will greatly enhance your understanding of the options you have to increase the performance and efficiency of your LLM-powered applications.

Enroll now

What's inside

Syllabus

Efficiently Serving LLMs
Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users. You’ll walk through the most important optimizations that allow LLM vendors to efficiently serve models to many customers, including strategies for working with multiple fine-tuned models at once. In this course, you will: - Learn how auto-regressive large language models generate text one token at a time. - Implement the foundational elements of a modern LLM inference stack in code, including KV caching, continuous batching, and model quantization, and benchmark their impacts on inference throughput and latency. - Explore the details of how LoRA adapters work, and learn how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously. - Get hands-on with Predibase’s LoRAX framework inference server to see these optimization techniques implemented in a real world LLM inference server. Knowing more about how LLM servers operate under the hood will greatly enhance your understanding of the options you have to increase the performance and efficiency of your LLM-powered applications.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Taught by a CTO with industry experience
Teaches hands-on LoRAX framework, which is industry standard
Develops knowledge and understanding of modern LLM stack, including caching, batching, and quantization
Provides a solid foundation for LLM application development

Save this course

Save Efficiently Serving LLMs to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Efficiently Serving LLMs with these activities:
Form study groups to engage in collaborative learning
Engage with peers in study groups to share perspectives, clarify concepts, and reinforce learning, fostering a deeper understanding of the course material.
Browse courses on Collaborative Learning
Show steps
  • Join or form a study group with other classmates.
  • Establish regular meeting times and decide on topics to discuss.
Review 'Speech and Language Processing'
Review the foundational concepts of natural language processing and speech processing to bolster your understanding of the course materials.
Show steps
  • Read the first two chapters to understand the core concepts of natural language processing and speech processing.
  • Review the glossary of key terms to familiarize yourself with the technical vocabulary.
Practice text generation with autoregressive models
Solidify your understanding of how autoregressive models generate text and practice writing your own text generation code to enhance your practical skills.
Browse courses on Text Generation
Show steps
  • Select a text generation library, such as Hugging Face's Transformers.
  • Load a pre-trained text generation model, such as GPT-2 or T5.
  • Generate text using the model and explore the different parameters that affect text quality.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Compile a collection of resources on LLM performance optimization
Organize and consolidate your knowledge by compiling a collection of resources on LLM performance optimization techniques, enhancing your understanding and ability to apply these techniques effectively.
Browse courses on Knowledge Organization
Show steps
  • Conduct research to identify relevant resources, such as research papers, tutorials, and blog posts.
  • Organize the resources into categories or sections.
  • Provide brief summaries or annotations for each resource.
Follow guided tutorials to implement LLM inference stacks
Follow guided tutorials and hands-on exercises to gain practical experience in implementing and optimizing LLM inference stacks, enhancing your understanding of the material covered in the course.
Browse courses on Model Serving
Show steps
  • Find tutorials that provide step-by-step instructions on building LLM inference stacks.
  • Implement the inference stack using the techniques covered in the tutorials.
  • Benchmark the performance of your inference stack to identify potential bottlenecks and areas for improvement.
Volunteer at an organization leveraging LLMs
Apply your knowledge and skills in a practical setting by volunteering at an organization that utilizes LLMs, gaining valuable hands-on experience and contributing to the community.
Show steps
  • Research organizations that leverage LLMs in their work.
  • Identify volunteer opportunities that align with your interests and skills.
  • Reach out to the organization and express your interest in volunteering.
Design and implement a prototype for an LLM-powered application
Put your knowledge into practice by designing and implementing a prototype for an LLM-powered application, fostering a deeper understanding of how LLMs can be utilized in real-world scenarios.
Show steps
  • Identify a specific problem or task that an LLM-powered application could address.
  • Design the user interface and functionality of the application.
  • Integrate the LLM into your application and explore various use cases.

Career center

Learners who complete Efficiently Serving LLMs will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Efficiently Serving LLMs.
LLMOps & ML Deployment: Bring LLMs and GenAI to Production
Most relevant
Large Language Models: Foundation Models from the Ground...
Most relevant
Learn Everything about Full-Stack Generative AI, LLM...
Most relevant
Rust for Large Language Model Operations (LLMOps)
Most relevant
Llama for Python Programmers
Introduction to Large Language Models
Introduction to Large Language Models with Google Cloud
End to End LLMs with Azure
Introduction to Generative AI for Software Development
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser