Efficiently Serving LLMs

Travis Addair

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users.

What's inside

Syllabus

Join our new short course, Efficiently Serving Large Language Models, to build a ground-up understanding of how to serve LLM applications from Travis Addair, CTO at Predibase. Whether you’re ready to launch your own application or just getting started building it, the topics you’ll explore in this course will deepen your foundational knowledge of how LLMs work, and help you better understand the performance trade-offs you must consider when building LLM applications that will serve large numbers of users. You’ll walk through the most important optimizations that allow LLM vendors to efficiently serve models to many customers, including strategies for working with multiple fine-tuned models at once. In this course, you will: - Learn how auto-regressive large language models generate text one token at a time. - Implement the foundational elements of a modern LLM inference stack in code, including KV caching, continuous batching, and model quantization, and benchmark their impacts on inference throughput and latency. - Explore the details of how LoRA adapters work, and learn how batching techniques allow different LoRA adapters to be served to multiple customers simultaneously. - Get hands-on with Predibase’s LoRAX framework inference server to see these optimization techniques implemented in a real world LLM inference server. Knowing more about how LLM servers operate under the hood will greatly enhance your understanding of the options you have to increase the performance and efficiency of your LLM-powered applications.

Save this course

Save Efficiently Serving LLMs to your list so you can find it easily later:

Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Efficiently Serving LLMs with these activities:

Form study groups to engage in collaborative learning

Show steps

Engage with peers in study groups to share perspectives, clarify concepts, and reinforce learning, fostering a deeper understanding of the course material.

Browse courses on Collaborative Learning

Show steps

Join or form a study group with other classmates.
Establish regular meeting times and decide on topics to discuss.

Review 'Speech and Language Processing'

Show steps

Review the foundational concepts of natural language processing and speech processing to bolster your understanding of the course materials.

View Speech and Language Processing An Introduction... on Amazon

Show steps

Read the first two chapters to understand the core concepts of natural language processing and speech processing.
Review the glossary of key terms to familiarize yourself with the technical vocabulary.

Practice text generation with autoregressive models

Show steps

Solidify your understanding of how autoregressive models generate text and practice writing your own text generation code to enhance your practical skills.

Browse courses on Text Generation

Show steps

Select a text generation library, such as Hugging Face's Transformers.
Load a pre-trained text generation model, such as GPT-2 or T5.
Generate text using the model and explore the different parameters that affect text quality.

Four other activities

Expand to see all activities and additional details

Show all seven activities

Compile a collection of resources on LLM performance optimization

Show steps

Organize and consolidate your knowledge by compiling a collection of resources on LLM performance optimization techniques, enhancing your understanding and ability to apply these techniques effectively.

Browse courses on Knowledge Organization

Show steps

Conduct research to identify relevant resources, such as research papers, tutorials, and blog posts.
Organize the resources into categories or sections.
Provide brief summaries or annotations for each resource.

Follow guided tutorials to implement LLM inference stacks

Show steps

Follow guided tutorials and hands-on exercises to gain practical experience in implementing and optimizing LLM inference stacks, enhancing your understanding of the material covered in the course.

Browse courses on Model Serving

Show steps

Find tutorials that provide step-by-step instructions on building LLM inference stacks.
Implement the inference stack using the techniques covered in the tutorials.
Benchmark the performance of your inference stack to identify potential bottlenecks and areas for improvement.

Volunteer at an organization leveraging LLMs

Show steps

Apply your knowledge and skills in a practical setting by volunteering at an organization that utilizes LLMs, gaining valuable hands-on experience and contributing to the community.

Show steps

Research organizations that leverage LLMs in their work.
Identify volunteer opportunities that align with your interests and skills.
Reach out to the organization and express your interest in volunteering.

Design and implement a prototype for an LLM-powered application

Show steps

Put your knowledge into practice by designing and implementing a prototype for an LLM-powered application, fostering a deeper understanding of how LLMs can be utilized in real-world scenarios.

Show steps

Identify a specific problem or task that an LLM-powered application could address.
Design the user interface and functionality of the application.
Integrate the LLM into your application and explore various use cases.