LlamaIndex: Train ChatGPT (& other LLMs) on Custom Data from Udemy

LlamaIndex for LLM applications with RAG paradigm, letting you train ChatGPT and other models with custom data.

Learn how to train ChatGPT on custom data and build powerful query and chat engines and AI data agents with engaging lectures and 4.5 hours of insightful content.

This course offers a mix of theoretical foundations and hands-on projects, ensuring you gain practical experience while grasping the core concepts.

By the end of this journey, you'll be proficient in creating advanced LLM apps, querying complex databases, employing AI agents, and designing your own chatbot interfaces.

Unlock the transformative power of LlamaIndex with our comprehensive course, "Unlocking LlamaIndex: Train ChatGPT on Custom Data and Beyond." With engaging lectures and 4.5 hours of rich, detailed content, this course is your one-way ticket to mastering LlamaIndex and creating custom LLM applications of the future.

Core Concepts:

Custom ChatGPT Training: Dive deep into the intricacies of training ChatGPT on custom data sets, empowering you to create language models tailored to specific industry needs.
RAG (retrieval-augmented generation): the cutting-edge AI framework. Imagine pulling the latest and greatest facts directly from external knowledge bases, all to supercharge your Large Language Models (LLMs). Not only does RAG ensure that your LLM is operating on the most up-to-date, accurate information, but it also gives you a thrilling behind-the-scenes look at how these language models generate their responses.
AI agents: Create smart AI data agents with LlamaIndex agents. Automate data tasks with LlamaIndex, optimize workflows, and create astonishing AI LLM applications. AI agents are the apps of the future.
Query and Chat Engines: Get hands-on experience in building stateful query and chat engines. Learn how to maintain context in your conversations, offering a more enriched user experience.
Streamlit Interfaces: Elevate your project by learning how to create intuitive and interactive interfaces using Streamlit. Turn your data scripts into fully functional web apps with just a few lines of Python.

Hands-on Projects:

Engage in real-world projects that allow you to apply your newfound knowledge. Create complex query engines, build chatbots with memory, design web apps to interact with your ChatGPT models, and create custom AI data agents as you go.

Who Is This Course For?

Whether you're a data scientist looking to broaden your skill set, a developer eager to dive into the world of large language models, or a curious individual wanting to understand the nuts and bolts of ChatGPT and LlamaIndex, this course is for you.

Last update of the course: 3.November.2023

100% 30-Day Money-Back Guarantee

We're so confident in the value this course will provide we offer a 100% money-back guarantee within 30 days of purchase. No questions asked.

By the end of this course, you'll be fully equipped to train ChatGPT on custom data, create LLM applications, and build intelligent query engines and AI data agents. You'll be an undisputed expert in LlamaIndex, ready to tackle any challenge that comes your way. Don't miss this opportunity. Enroll today.

How to set up an environment for easy development of LLM apps with LlamaIndex

Hi,

Throughout the course, you'll be working with podcast notes from Andrew Huberman.

Download the resources

You can also get the GitHub repository of the code used in this course.

Other resources:

- Discord server for LlamaIndex

- LlamaIndex documentation

Feel free to contact me when you need help with anything.

Enjoy the ride :)

Jana

Before you start

In the video, you'll learn about:

- LLMs like ChatGpt

- You will explore types of Large language models

- And basic concepts like tokenization, context window, hallucinations, properties like temperature, model fine-tuning, and RAG

We'll kick things off with a comprehensive look at applications built on or utilizing LLMs. Think chatbots, smart agents, and even intelligent workflows.

Get OpenAI API key

Introduction to LlamaIndex and LLM applications

In the video, we explore the current state of LLM applications.

We're witnessing a surge in LLM adoption across industries, supported by hard data.

Did you know 65% of companies have LLM applications in production? Or that 94% are using a foundation model API? The numbers don't lie; LLMs are the future.

Curious about the technologies that make these applications tick? We'll dissect the tech stack, explaining where tools like LlamaIndex and LangChain come into play.

Unveiling the Magic of RAG in LLMs!

In this video, you will dive into Retrieval Augmented Generation, or RAG, and its significance in Large Language Models (LLMs).

Inside the Video:

1. What is RAG?: Discover what Retrieval Augmented Generation (RAG) means and how it allows us to integrate custom data into LLMs.

2. Why Use RAG?: Ever wondered how to make your LLM aware of your business-specific data or recent updates? Learn why RAG could be your go-to solution.

3. How Does RAG Work?: Step-by-step, we'll break down the process of RAG, from data chunking to querying. Find out how LlamaIndex plays a key role here.

4. Key Benefits of RAG: Learn how RAG enhances relevance, adaptiveness, and performance in your LLM applications.

Why You Should Watch:

Up-to-date Knowledge: LLMs trained on outdated data can be a thing of the past.
Cost-Effectiveness: RAG offers a more economical alternative to fine-tuning your LLM.
Customized Experience: Tailor your LLM to your specific business needs with ease.

Key Takeaways:

RAG allows for real-time, on-the-spot learning by the LLM.
LlamaIndex and LangChain are leading the way in this RAG-enabled future.
Expect more meaningful and accurate interactions when using RAG-enhanced LLMs.

Welcome to the LlamaIndex vs. LangChain Showdown!

Hi and welcome to this insightful video where we'll explore the nuances between LlamaIndex and LangChain, two power-packed tools in the Large Language Models (LLMs) application stack.

What You'll Learn:

Commonalities: Discover what LlamaIndex and LangChain have in common, from question answering and chatbot creation to text summarization and agent functionalities.
Unique Strengths: Learn what sets each tool apart. While LlamaIndex excels in vector embedding and data retrieval, LangChain shines in agent functionalities and multi-step chains.
Integration Capabilities: Find out how these tools can work together to amplify your LLM applications. LlamaIndex offers deep integrations with LangChain, and vice versa.

Main Focus Areas:

LlamaIndex: Your go-to tool for interfacing between LLMs and external data sources. It specializes in loading and querying documents, offering a robust toolkit for extracting and retrieving information.
LangChain: The broader framework for building and managing LLM-powered applications. LangChain offers abstractions for creating chatbot agents and incorporates a memory module called ConversationBufferMemory.

Key Takeaways:

LlamaIndex: Often described as a "simple, flexible data framework," LlamaIndex allows you to set up a query interface around your custom data without the need for fine-tuning or long prompts.
LangChain: This agent framework provides wrappers around LlamaIndex and enables you to insert conversation history, making it a comprehensive solution for chatbot development.

Summary:

LlamaIndex is a specialized tool for data querying and retrieval and data agents, while LangChain serves as an expansive framework for building LLM-powered applications. Despite their differences, they complement each other well and can be integrated seamlessly.

Exploring Data Privacy in Large Language Models (LLMs)

Welcome to this comprehensive video where we address a pressing concern: Is it safe to implement Large Language Models (LLMs) in your business operations? We'll guide you through various considerations on data privacy, focusing on popular LLMs like ChatGPT from OpenAI, Anthropic's Cloud 2, HuggingFace's Transformers, and LLaMa 2 by Meta.

What You'll Learn

The Privacy Dilemma: Understand the risks and questions surrounding the use of LLMs in various business sectors like customer support, predictive maintenance, and sales.
Proprietary vs. Open Source: Discover the pros and cons of using proprietary LLMs like ChatGPT or Anthropic's Cloud 2 versus open-source options like HuggingFace's Transformers and Meta's LLaMa 2.
Data Handling Policies: Get insights into how different companies handle the data you provide. We'll discuss OpenAI's API data privacy policies, Anthropic's data retention and deletion policies, and HuggingFace's security measures.

Key Considerations

ChatGPT by OpenAI: Learn about OpenAI's commitment to data privacy and the specific terms and conditions surrounding the use of your data.
Anthropic's Cloud 2: Explore the data policies of this emerging player in the LLM market, with an emphasis on their 30-day data retention policy.
HuggingFace Transformers: Understand how HuggingFace provides a secure and flexible environment for implementing LLMs, along with its community-driven approach.
Meta's LLaMa 2: Learn about this open-source model's performance metrics and how it provides you control over data hosting and training.

Important Takeaways

Data Preprocessing: Regardless of your choice of LLM, learn why it's crucial to preprocess your data to remove any sensitive or personally identifiable information before implementation.
Security Concerns: Get a sneak peek into our next video, where we'll delve deeper into the security aspects of implementing LLMs in your operations.

Security challenges in LLM applications

In this video, you'll get a super simple explanation of how LlamaIndex works.

Understanding the basics will simplify the learning process.

What You'll Learn:

Introduction to LlamaIndex: Uncover what makes LlamaIndex a pivotal tool for connecting an array of data sources and formats with large language models.
Core Components: Discover the crucial building blocks of LlamaIndex such as knowledge base indices, Nodes, and Documents, and understand their roles in building Query Engines and Chat Engines.
Data Loaders: Explore a wide range of data loaders in the LlamaHub that allows you to pull data from diverse sources like SQL databases, Google Docs, YouTube, and more.
Hands-on Project: Follow along as we create a real-world project by querying transcripts from Andrew Huberman's podcasts about sleep. Learn how to configure logs, set up API keys, and understand token usage.
Under the Hood: Get a peek into the engine room of LlamaIndex as we talk about data chunking, vector embedding, and in-memory vector stores.
Querying & Retrieval: Learn how to effectively query your data, understand the role of retrievers, routers, and node postprocessors in fetching and filtering data.
Response Synthesis: Familiarize yourself with how LlamaIndex generates knowledge-augmented responses based on user queries.
Comparison with LangChain: Understand the nuanced differences between LlamaIndex and LangChain agents, and how they can be used for specific use cases.
Customization: Get a roadmap for customizing various low-level components to build your own query pipeline.

In this video, you will learn:

The Basics of LLMs: What are Large Language Models and why should you care? We'll discuss various implementations like OpenAI, Anthropic, and more.
API Endpoints & Methods: Get a lowdown on different ways to interact with LLMs. We'll explore chat and completion methods, streaming vs. non-streaming, and synchronous vs. asynchronous options.
Deep Dive into LlamaIndex: Learn how LlamaIndex simplifies LLM integration by providing a comprehensive interface, eliminating the need for boilerplate code.
Hands-On Examples: Watch as we run live code examples using Visual Studio Code. We'll use OpenAI's API and dive into how to set it up, from obtaining API keys to making the first call.
Model Properties & Parameters: Unveil the magic behind properties like "temperature" and token limitations, and how they affect the model's behavior.
Multi-Persona Responses: Ever wondered what AI would sound like if it was a hippie? We'll explore that too!
Best Practices: Tips on choosing the right LLM and setting up different models for your specific needs.
Troubleshooting: Real-time debugging and understanding API responses, token limitations, and more.

Who Should Watch?

Beginners who are curious about LLMs and want a comprehensive introduction.
Intermediate users looking to understand how to better utilize LLMs in their projects.
Advanced users interested in the nitty-gritty details of LLM implementations and integration.

Tools & Setup

Visual Studio Code
Python Environment
Jupyter Notebooks
OpenAI & LlamaIndex Packages

Pre-requisites

Basic understanding of Python
A zest for learning about AI and LLMs!

Discover the fascinating universe of Large Language Models (LLMs) in this comprehensive video. Navigate through key benchmarks for model evaluation and explore the nuances between open-source and proprietary LLMs. Gain insights into specialized, domain-specific models that cater to industries like healthcare, finance, and legal sectors. Whether you're curious about OpenAI's GPT-4, Microsoft's Orca, or the latest in cybersecurity models, this video has got you covered. We dive deep into the metrics for assessing model performance and the leaderboards that track them. Perfect for tech enthusiasts, developers, and businesses seeking to leverage the power of LLMs.

LlamaIndex dive in deeper

Welcome to an in-depth guide to LlamaIndex, where we peel back the layers of this robust tool that's revolutionizing the way you interact with data. Whether you're a beginner or an expert, this comprehensive video is designed to give you a firm grasp of LlamaIndex’s building blocks, from indexing and querying to the creation of agents. Grab a cup of coffee and get ready to dive in!

What You'll Learn:

The RAG Principle: Understand Retrieval Augmented Generation, and how it works in tandem with Large Language Models and your custom data.
Indexing & Querying: Learn the nuts and bolts of the indexing and querying pipeline, and why they're essential in making LlamaIndex tick.
Data Loaders: Discover the power of data connectors like the SimpleDirectoryReader and explore the open-source repository LlamaHub.
Documents & Nodes: Uncover what makes up a document in LlamaIndex, how it’s chunked into nodes, and why metadata matters.
VectorStoreIndex and Other Indexes: Dive into the most commonly used index and explore other types of indexes tailored to different use cases.
Embeddings: Get to know how text data is converted into numerical vectors, and how you can switch between different embedding models.
ServiceContext: Learn about this essential class that bundles commonly used resources in LlamaIndex.
Query & Chat Engines: Understand how to build conversational agents and how to query your data effectively.
Retrievers and Postprocessors: Find out how these components fetch relevant data and transform it before generating a final response.
Routers: Learn about the decision-making power of routers in guiding your queries to the right place.
Agents: Understand what sets agents apart as automated decision-makers in LlamaIndex.

Welcome back to another hands-on video where we'll go beyond the basics with the Chat with Andre wexample! We'll explore advanced settings, tweaks, and code-level modifications to enhance your Language Learning Model (LLM) projects.

What You'll Learn:

How to work with transcripts of Andrew Huberman's podcasts on sleep.
Setting up OpenAI keys and loading Vector Stores.
Using SimpleDirectoryReader to read and save folder content.
Creating LlamaIndex to transform nodes into queryable embeddings.
Setting up and customizing Query Engines.
Fine-tuning node retrievers and response synthesizers.
Debugging tips to visualize node content and relationships.

Video covers:

OpenAI Key
Vector Store
SimpleDirectoryReader
LlamaIndex
Query Engine
Vector Index Retriever
Node Postprocessors
Response Synthesizers

Key Features:

Quick Set-Up: We start with a brief recap of setting up OpenAI keys and loading directories so you're all caught up.
Under the Hood: Ever wondered what's happening behind the scenes?
Custom Queries: Learn how to manipulate the retriever to fetch more or fewer nodes based on similarity scores.
Fine-Grained Control: Use node postprocessors like Similarity Postprocessor and Keyword Node Postprocessor to refine query results.
Debugging and Validation: Get hands-on with debugging tools that allow you to preview node content and validate your query before sending it to an LLM.
Response Synthesis: Modify the response mode to fit your needs, whether you're debugging or seeking a more compact output.
Advanced Tweaks: Want to exclude certain keywords or set a similarity threshold? We've got you covered.

Efficiently Persisting LLM Indices: Save Time and Resources by Reusing Your Data

Welcome back to our advanced tutorial series! You've already learned how to create an index, configure Retriever, node postprocessors, and response synthesizers. But are you tired of having to recreate the index every time you need to query your data? This video will show you how to save the index to your disk for future use, making your work more efficient and your LLM projects more dynamic.

What You'll Learn:

How to organize your storage folders for index persistence.
Utilizing LlamaIndex's storage context and persist methods.
Reading and understanding the Docstore, Index Store, and Vector Store.
Implementing simple checks to avoid unnecessary index rebuilds.

Video covers:

LlamaIndex
Storage Context
Persist Method
Docstore, Index Store, Vector Store

Key Features:

Storage Setup: I'll guide you through setting up a structured directory to save your index files.
Easy Index Persistence: Learn how to use LlamaIndex's interface to save your index to disk effortlessly.
Understanding Stored Data: Explore how nodes, metadata, and other data are stored and what they mean for your LLM project.
Metadata Exclusions: Learn the difference between 'excluded embed metadata keys' and 'excluded llm metadata keys' and their impact on your index.
Conditional Index Rebuilding: Implement a simple check to determine if an index rebuild is necessary, saving you time and computational resources.

Token Management in LLMs: Mastering TikToken for Efficient Queries and Savings

Learn all about counting tokens while creating and querying LlamaIndex. Whether you're a beginner or an advanced user, this video will give you the essential knowledge to manage tokens efficiently, saving you time and money in the long run.

What You'll Learn:

The fundamentals of Tokens and Tokenization in language models.
An introduction to Byte Pair Encoding (BPE) and how it impacts token count.
A hands-on guide to using TikToken, OpenAI's incredible tool for token management.
How to count tokens in real-time using Service Context and Callback Manager.
Strategies for managing token count during both index creation and querying.

Tools and Libraries Covered:

TikToken
LlamaIndex
Service Context
Callback Manager
Token Counting Handler

Key Features:

BPE and TikToken: Learn how Byte Pair Encoding works in LLMs and how TikToken can help you manage tokens efficiently.
Token Counting in Action: Watch as we dive into the code, integrating TikToken into a real LlamaIndex project.
Real-Time Token Monitoring: Understand how to use Service Context and Callback Manager for real-time token counting.
Query Token Management: Learn the nuances of managing tokens during querying, including counting for embeddings, prompts, and completions.
Resetting Token Count: Discover how to use token_counter.reset_counts() to manage your ongoing token usage effectively.

Prerequisites: A basic understanding of LLMs, LlamaIndex, and Python programming is recommended.

Don't let token usage sneak up on you and inflate your costs. Equip yourself with the skills to manage it efficiently. Hit the play button now and take control of your token management!

Unlock the full potential of your Language Learning Models (LLMs) by customizing query prompts in LlamaIndex. This comprehensive tutorial takes a deep dive into the intricacies of system and user prompts, showing you how to tailor them to your specific needs for more accurate and relevant responses.

What You'll Learn:

How the default System and User Prompts work in LlamaIndex.
The art of crafting Custom Prompts to feed into your LLM queries.
The role of Prompt Templates and how to implement them in your code.
How to Debug and Monitor prompts during runtime.
Real-world applications using the Andrew Huberman Sleep index as an example.

Tools and Libraries Covered:

LlamaIndex
Custom Prompt Templates
Debugging tools

Key Features:

Understanding Default Prompts: Gain a thorough understanding of the prompts LlamaIndex uses by default and how they influence query results.
Customizing Prompts: Learn step-by-step how to create your own prompt templates and integrate them into the LlamaIndex query engine.
Debugging and Monitoring: Utilize debugging tools to monitor the prompts being sent to OpenAI, ensuring your customizations are taking effect.
Real-World Example: See all these concepts in action using the Andrew Huberman Sleep index, making the tutorial highly relatable and practical.
Troubleshooting and Tweaks: Tips and tricks on how to adjust prompts and when you might need to consider changing the model for better results.

Fine-tuning your LLM queries has never been easier. Take control of your prompts and make your LLM work precisely the way you want. Hit play and start mastering custom prompts in LlamaIndex today.

Refining prompts with user and system role propmpt

Index your data

Creating indexes with LlamaIndex

A short introduction to creating indexes in LlamaIndex train ChatGPT (or other LLM) on custom data

Embark on a journey to master the art of data indexing using LlamaIndex! Whether you're a beginner eager to get started or a seasoned developer looking to optimize your search capabilities, this comprehensive guide has got you covered.

What You'll Learn:

Data Loading Simplified: Learn how to load your data effortlessly using SimpleDirectoryReader and explore the plethora of options available in LlamaHub.
Node Parsing & Tokenization: Understand the importance of chunking data into nodes using TokenTextSplitter, and how to set overlap for context preservation.
Index Types & Vector Stores: Dive into the variety of index types LlamaIndex offers, with a focus on the popular VectorStoreIndex.
Embedding Models: Get insights into turning your data into vectors for similarity search and how to switch between different embedding models.
Service Context: Discover how to use ServiceContext for global and local configurations.
Storage Solutions: Uncover the magic behind storing documents, indices, and vectors using LlamaIndex’s high-level storage interfaces.

Key Tools & Features:

SimpleDirectoryReader
TokenTextSplitter
VectorStoreIndex & SimpleVectorStore
ServiceContext
Storage Context

Deep Dives:

Custom Data Loaders: Can't find what you need? Create your custom data loader.
Vector Store Options: An overview of Pinecone, Weaviate, and Chroma vector store options.
Local vs Global Configurations: Learn when to use global or local configurations for fine-tuning your indexing process.

Get ready to up your indexing game with LlamaIndex! Press play and dive into this all-in-one guide to data indexing. See you on the other side!

Welcome back to our in-depth course! In this video, we dive deeper into the heart of LlamaIndex: Documents and Nodes. We're building a chatbot informed by Andrew Huberman's podcast notes, but we're also laying the groundwork for handling complex data structures, such as financial reports, code snippets, tables, and more.

What You'll Learn:

Documents vs Nodes: Discover the nuances between documents and nodes in LlamaIndex and why they matter for retrieval and Language Learning Models (LLMs).
Custom Metadata: Learn how to add custom attributes to nodes for a richer, more contextual interaction with LLMs.
Node Parsing Techniques: Get hands-on experience with various node extractors like the Title Extractor and QuestionAnswer Extractor to give more context to your LLMs.
Efficient Indexing with Service Context: Understand how to configure node parsers during the indexing process through Service Context.
Live Coding Session: Follow along as we import a text file using SimpleDirectoryReader, manually extract nodes, and integrate them into an index.

Key Tools & Concepts:

Documents & Nodes
SimpleDirectoryReader
Node Parsers
Service Context
Metadata & Title Extractor
Text Splitter
Vector Store Index

Deep Dives:

Complex Data Handling: See how custom splitters can help index complex documents with tables, graphs, markdown, etc.
Query Examples: Run real-world queries to test how well your nodes and metadata perform.
Cost vs Context: Learn to strike a balance between providing rich context and managing computational cost.

Grab your coding gloves, and let's dive into the intricate world of documents and nodes in LlamaIndex. This is a must-watch for anyone looking to up their game in contextual understanding and intelligent search. Click play, and let's get started.

Add metadata to documents and nodes

In this video, we're focusing on a crucial aspect of working with LlamaIndex: managing dynamic data. As data evolves over time—new files are added, existing files are modified, or even deleted—your index needs to be just as dynamic to maintain its effectiveness.

What You'll Learn:

Dynamic Data: Understand the importance of managing an ever-changing dataset in real-time with LlamaIndex.
Node IDs: Learn how using filenames as node IDs can significantly improve your index's smart updating capabilities.
Refreshing Index: Dive into the mechanics of refreshing your index to sync it with the current document set.
Tracking Changes: See how LlamaIndex can smartly track new, updated, or deleted documents.

Key Tools & Concepts:

Simple Node Parser
Text Splitter
Filename as ID Parameter
Index Refreshing

Deep Dives:

File Addition Example: Watch as we add a new file to our dataset and see how LlamaIndex reacts.
File Modification Example: Observe how LlamaIndex smartly identifies when a file within the index has been modified.
Optional Parameters: Learn about additional parameters like delete_from_doctore for further control over your document store.

We'll go through a live coding session where we set up an index, delete it, and show you how to refresh it based on new or modified data. So if you're looking to master the fine art of managing a dynamic index, you won't want to miss this video. Hit that play button and let's get started.

Welcome back to another illuminating video in our course series! Today, we're delving into the fascinating world of sentence embeddings, but with a twist! We'll explore how to use Sentence Transformers from Hugging Face, specifically focusing on the all-mpnet-base-v2 model, and integrate them into your LlamaIndex projects.

What You'll Learn:

Sentence Transformers: Understand what they are and why they're so powerful in capturing the semantic meaning of text.
Comparison of Models: Get insights into how the all-mpnet-base-v2 model differs from OpenAI's text-ada-002 in terms of dimensional space.
Hands-On Implementation: Step-by-step guide to incorporating Hugging Face Sentence Transformers into your LlamaIndex workflow.
Reindexing: Know why and how to reindex your data when changing your embedding model.

Key Tools & Concepts:

Sentence Transformers
Hugging Face
Langchain
Service Context in LlamaIndex

Deep Dives:

Model Installation: Learn how to install and integrate Sentence Transformers and Langchain.
Model Application: Watch a live demonstration of generating embeddings using the all-mpnet-base-v2 model.
LlamaIndex Integration: Discover how to replace the default embeddings in LlamaIndex with Sentence Transformer models.

Your Assignment: Your task is to change the embedding model to the "instructor model," which is known for its efficiency. Check out the embedding model leaderboard to make an informed choice.

In this video, you're diving deep into the fascinating world of text embeddings and how they work in LlamaIndex. We cover a variety of embedding types, how to implement them, and how to choose the one that's best suited for your needs.

What You'll Learn

The Role of Vector Models: Discover how all data is transformed into vectors for more efficient search.
Types of Text Embeddings: From Word Embeddings to specialized Instructor Embeddings, understand the different ways you can represent text as numbers.
Choosing the Right Embedding: Get insights into how to select the most effective embedding model for your project.
Practical Tips: Learn how to implement these embeddings in LlamaIndex, with support for popular platforms like OpenAI and Hugging Face.

Key Takeaways

Text embeddings are numerical representations of words or phrases, essential for tasks like text classification, sentiment analysis, and more.
Consistency is key! Use the same vector model for both your data and search queries to get accurate results.
You can benchmark embedding models to find the best fit for your project. The top models usually differ by only 2% in effectiveness.

Tools and Technologies Mentioned

OpenAI Embeddings
LangChain Embeddings
Hugging Face
Cosine Similarity

Code Highlights

How to download models from Hugging Face locally.
Implementing your own embedding class in LlamaIndex.

Final Thoughts

Whether you're dealing with multilingual content, legal or medical data, or simply looking to understand text at a deeper level, this video provides valuable insights.

In this video, we take a deep dive into integrating Chroma Vector Database with LlamaIndex. As part of our ongoing course, this video focuses on enhancing your storage capabilities, particularly for handling large volumes of vector data.

What You'll Learn:

Chroma Vector Database: An introduction to the open-source embedding database tailored for LLM applications.
Storage Context in LlamaIndex: Understand its pivotal role in customizing storage preferences, including vector storage.
Hands-on Coding: Step-by-step guide on implementing Chroma Vector Database into your LlamaIndex setup.
Persistent Storage: Learn how to save your vectors to disk for quicker access in future sessions.

Key Tools & Concepts:

Chroma Vector Database
LlamaIndex
Storage Context
SimpleDirectoryReaders
OpenAI Ada Embeddings

Deep Dives:

Chroma Installation & Setup: Get a walkthrough on installing Chroma and setting up a collection.
Vector Store Configuration: Learn how to initialize and utilize Chroma Vector Store within the LlamaIndex framework.
Querying Chroma Collections: Discover how to perform queries using embeddings to retrieve similar nodes from Chroma.
Persistent Database: See how to create a persistent Chroma database and save it to disk.

Your Assignment: Your challenge is to switch from using the default OpenAI embeddings to Hugging Face embeddings and store them in Chroma DB. Think it over and create a solution!

Switch from using the default OpenAI embeddings to HuggingFace embeddings

Querying with LlamaIndex

A short introduction to querying in the LlamaIndex

In this lesson, we're stepping up the game by exploring how to search over multiple specialized indexes efficiently. Imagine wanting to sift through podcast summaries and also needing specific topic-based details—well, LlamaIndex has got you covered.

What You'll Learn:

Router Query Engine: Discover how to create a router query engine that intelligently decides between vector and list summary engines based on the query.
Query Engine Tools: Learn how to use this essential tool for translating natural language queries into specific query engines.
Selectors in LlamaIndex: Get an in-depth look at various types of selectors like LLMSingleSelector, PydanticSingleSelector, and more to make informed choices on query routing.

Key Tools & Concepts:

Router Query Engine
LlamaIndex's Vector and List Indexes
Query Engine Tools
Pydantic Single Selector
Service Context

Deep Dives:

Building a List Index: Step-by-step guide on how to construct a list index that's best suited for summaries.
Customizing Node Sizes: Learn how to manipulate node sizes easily through the Service Context.
Debugging and Logs: See what goes on behind the scenes in LlamaIndex when you perform a query.
Selector Choices: Understand why Pydantic selectors are often more reliable for use with OpenAI.

Your Assignment: Your challenge after this video is to set up a multi-index search in your own LlamaIndex environment. Pay special attention to the descriptions you give in the Query Engine Tools to help the AI model make informed decisions.

Ready to take your search capabilities in LlamaIndex to the next level? Hit play, and let's dive in!

Welcome back to our LlamaIndex course series! In today’s episode, we’re diving deep into a fascinating feature you might not have explored yet—Chat Engines. Unlike Query Engines, Chat Engines let you have a back-and-forth conversation with your data in a stateful manner. Sounds intriguing? Let's dig in!

What You'll Learn:

Understanding Chat Engine: Get a comprehensive understanding of how Chat Engines differ from Query Engines and why they're essential for stateful conversations with your data.
Chat Modes Explained: Become proficient in various chat modes like 'best', 'context', 'condense_question', 'simple', 'react', and 'openai'.
Working Example: Follow along as we create a Chat Engine using the 'condense_question' mode and explore its capability to remember chat history.

Key Tools & Concepts:

Chat Engine vs Query Engine
Stateful Conversations
Available Chat Modes in LlamaIndex
Utilizing Context in Chats

Deep Dives:

Condense_Question Mode: Understand how this mode analyzes chat history to rephrase user messages into useful queries.
Resetting Chat History: Learn how to use the reset method to clear the chat history and its implications.
Logs & Responses: A look into the info log to understand how LlamaIndex is interpreting your questions.

Your Assignment: After this lesson, try setting up your own Chat Engine in LlamaIndex. Experiment with different chat modes and observe how each mode affects the conversation.

Ready to make your data interaction more engaging and stateful? Hit the play button, and let’s get started!

Today, we're diving into the fascinating world of querying SQL databases using LlamaIndex's NLSQLTableQueryEngine. And guess what? We're doing it in the most adorable way possible—by exploring a database all about cats!

What You'll Learn:

Setting up a simple SQLite database featuring our feline friends
The essentials of the NLSQLTableQueryEngine in LlamaIndex
How to construct natural language queries that synthesize into SQL queries
Debugging and monitoring API requests for a better understanding
The importance of model validation and upgrading to GPT-4

Key Tools & Concepts:

SQLite
LlamaIndex's SQLDatabase and NLSQLTableQueryEngine
Natural Language Queries
Model Validation
GPT-4 Upgrade

The Cat-tastic Database: We've built a simple SQLite database that's all about cats—Savannah, Ragdoll, and Maine Coon to be precise. Learn how to query for the biggest cat, explore the quirks of natural language queries, and more. Meet Ben, the Ragdoll, who's on a mice-catching spree!

Deep Dives:

Natural Language to SQL: Uncover how NLSQLTableQueryEngine synthesizes your everyday language into SQL queries.
Debugging API Calls: Keep an eye on what's happening under the hood by setting logs to debug.
Model Validation: Why it's crucial to validate responses from the model, especially when dealing with sensitive or factual data.

Your Assignment: Time to get your paws dirty! Your task is to create an SQL query engine that utilizes the GPT-4 model. Once you're done, share your code with us and head on to the next video.

Tail-end Note: For those of you who're more doggedly focused on dogs or plants, don't worry! The principles taught here apply to all kinds of databases. So hit that play button, and let's get meow-ving!

Have you ever wondered how to query across different kinds of data sources and synthesize the responses into a single, comprehensive answer? Look no further! In today's episode, we'll introduce you to LlamaIndex's powerful Subquestion Query Engine and show you how it can be used to achieve just that.

What You'll Learn:

Why Subquestion Query Engine: Understand the significance of Subquestion Query Engines in querying multiple data sources.
Hands-on Example: Follow along as we query data from multiple YouTube videos about different cat breeds using the Youtube Transcript API and LlamaIndex.
Metadata & Descriptions: Learn how to use ToolMetadata and Query Engine Tool for better query routing.

Key Tools & Concepts:

LlamaHub for resource loaders
YouTube Transcript API for data fetching
ToolMetadata for engine descriptions
Subquestion Query Engine

Deep Dives:

YouTube Transcript API: See how to fetch video transcripts from YouTube and turn them into searchable data.
Tool Metadata & Descriptions: Learn the importance of writing good metadata for your query engines for efficient routing.
Working with Multiple Indices: Understand how to create and manage multiple vector indices and query engines for different cat breeds.

Your Assignment: Try creating your own Subquestion Query Engine in LlamaIndex using multiple data sources. Test different types of queries to see how the engine intelligently routes them.

Don't miss out on this fascinating journey into the world of advanced querying with LlamaIndex. Hit play now!

If you've been following our series, you've already dipped your paws into the world of querying SQL databases and fetching data from YouTube videos. But what if we could combine these two worlds? In this video, we're going to do just that, using the SQLJoinQueryEngine in LlamaIndex!

What You'll Learn:

An introduction to SQLJoinQueryEngine, a tool for merging structured and unstructured data.
How to set up a multi-layered query engine that decides which vector index to query.
Practical examples of semantic and table-based queries—featuring, of course, our favorite felines. ?

Key Tools & Concepts:

SQLJoinQueryEngine
Subquestion Engines
Query Tool Descriptors
SQL and Vector-Based Queries

Connecting the Dots: We'll start by revisiting the code you've already written for querying SQL tables and YouTube scripts. Then, we'll step it up a notch by introducing the SQLJoinQueryEngine, which will help us decide which query engine to use based on the question we ask.

Deep Dives:

Query Tool Descriptors: Learn how to write comprehensive descriptions for your query tools. This ensures that the SQLJoinQueryEngine picks the right tool for the job.
Real-World Testing: We'll run a series of questions to see how well our newly created query engine performs. From finding out which cat is the biggest to knowing their average lifespan, we'll put it all to the test.

Your Homework: Take what you've learned today and start playing with the SQLJoinQueryEngine. Try to add more layers, perhaps integrate other types of data, and see how it performs. The sky—or should we say, the scratching post—is the limit!

Note: Effective querying relies on well-described query tools. Make sure you give LLM all the information it needs to make the right choice.

So, are you ready to make your queries even more powerful and diverse? Let's not keep Ben, our Ragdoll cat, waiting any longer. Hit that play button, and let's dig in!

Welcome back to our in-depth LlamaIndex series! This episode dives into Streamlit, an open-source Python library that makes it incredibly simple to create web apps. You'll learn how to seamlessly integrate your LlamaIndex chat engine into a user-friendly Streamlit interface. Not just that, you'll also get to improve your bot's prompt templates for better user interactions.

What You'll Learn:

Streamlit Basics: Get introduced to Streamlit and how it simplifies web app development with Python.
Creating a Chatbot Interface: Step-by-step guide to build a chat interface with Streamlit and integrate it with your LlamaIndex chat engine.
State Management: Discover the powerful session_state feature in Streamlit for maintaining chat history.

Key Tools & Concepts:

Streamlit for web development
session_state for maintaining app state
LlamaIndex's condense_question mode
Customizing bot prompts

Deep Dives:

Streamlit session_state: Learn how to maintain chat messages between reruns and interactions.
Chat Message Container: Understand how Streamlit's built-in feature simplifies the chatbot interface creation.
Integrating with LlamaIndex: Bring in your existing LlamaIndex chat engine and make it interactive.

Your Assignment: Your task is to improve the bot's prompt templates for more accurate and user-friendly responses. Share your results and let us know how it goes!

So, are you ready to build a smart, interactive chatbot with Streamlit and LlamaIndex? Hit that play button, and let's dive in!

Chainlit alternative to Streamlit

Create your chatbot

LlamaIndex data agents

About LlamaIndex data agents section

In this video, we're diving deep into the fascinating world of AI agents, with a special focus on LlamaIndex data agents.

What You'll Learn:

AI Agents Basics: Understand what an AI agent is and its typical workflow, including data acquisition, analysis, and decision-making.
Real-world Applications: Discover how AI agents are making a significant impact in various sectors like healthcare, finance, and cybersecurity.
Introduction to LlamaIndex: Learn about LlamaIndex agents, specialized tools designed for automated problem-solving and decision-making.
Components of LlamaIndex Agents: Understand the key building blocks of LlamaIndex agents, such as user input handling and task planning.

Key Tools & Concepts:

REACT Reasoning
Data Management
Task Planning
Memory Module

Deep Dives:

Subquestion Query Agents: Get familiar with agent-like tools within LlamaIndex that help break down complex queries.
Router Query Engine: Learn about how LlamaIndex can route tasks for more efficient problem-solving.
Comparison with Other Frameworks: Understand how LlamaIndex compares with other agent frameworks like Langchain.

Extras:

Interview Highlights: Catch a sneak peek into a compelling interview featuring hackathon attendees who have built incredible AI agents using LlamaIndex.

In this video, we're diving into creating a simple OpenAI agent using LlamaIndex that can intelligently choose between various QueryEngine tools.

What You'll Learn:

Understanding Agents: Get the gist of what an agent does and how it determines solutions for given jobs.
Creating a Simple OpenAI Agent: Learn step-by-step how to build a simple OpenAI agent that can decide which QueryEngine tool to use for specific tasks.
Working with PDFs: Understand how to load PDFs, such as sample podcasts, into the agent for processing.
Node Postprocessors: Learn how to utilize node postprocessors to exclude certain keywords from the search.
Vector Indexing: Dive into creating vector indexes and query engines and understand their role in the agent's decision-making process.

Key Tools & Concepts:

QueryEngine Tools
Node Postprocessors
PDF Loader
Vector Indexing
LlamaHub

Code Walkthrough:

Setting Up the Code: Learn how to prepare your code environment, including importing necessary modules and initializing variables.
Building the Agent: Follow along as the code for the OpenAI agent is written and executed, with explanations for each step.
Testing the Agent: See how the agent performs in real-time

Extras:

Debugging Tips: Get insights on how to debug common issues and validate the agent's performance.
Personal Experience: Gain additional context about how diet affects emotions, adding a personal touch to the tutorial.

Create an agent over multiple indexes

In the video, we dive into enhancing your OpenAI agent with LlamaIndex by implementing Recursive Retrievers and creating multiple indexes. This approach elevates your agent's decision-making capabilities, particularly for summarization queries.

What You'll Learn:

Introduction to Recursive Retrievers: Understand the theory behind using Recursive Retrievers for more complex decision-making.
Creating Multiple Indexes: Learn how to set up multiple indexes for each document to optimize query response.
Agent Customization: Step-by-step guide to tailor your OpenAI agent to make decisions based on the type of query—semantic search or summarization.
Debugging and Error Handling: Tips and tricks on how to debug your code effectively.

Key Tools & Concepts:

Summary Indexes
Vector Indexes
Recursive Retrievers
OpenAI Agents
LlamaIndex Schema

Code Walkthrough:

Modifying Existing Code: Begin by tweaking the code from the previous video to include summary indexes alongside vector indexes.
Agent Creation: How to create multiple agents for each document, focusing on semantic search and summarization.
Implementing Recursive Retrievers: The meat of the tutorial, where you learn how to implement a Recursive Retriever, which acts as a decision-making layer in front of your agents.

Deep Dive:

Node Creation: Create nodes that serve as the 'descriptions' of each agent, essential for the Recursive Retriever to make decisions.
Testing the Agent: Live examples to test the agent’s decision-making capabilities, including queries that require summarization.

Create agents and add a recursive retriever over your agents

Before you go

Hey,

We hope you're enjoying your journey through The Ultimate Guide to Training ChatGPT on Custom Data with LlamaIndex!

Your experience and insights matter to me.

As you read this, I'm creating, filming, and improving this course. And I'd appreciate your feedback.

What did you love? What could be improved? Are there additional topics you wish were covered?

Let me know so I can fulfill your wishes.

Please take a few minutes to complete our LlamaIndex Course Satisfaction Survey. Your feedback is crucial for making this course even better and tailored to your needs.

Don't miss this opportunity to shape the future of this course and, who knows, maybe even the field of LLMs!

Click here to start the survey:

https://forms.gle/5SXLU8skhVkdAR4E7