We may earn an affiliate commission when you visit our partners.
Course image
Sebastian Witalec

Learn how to build multimodal search and RAG systems. RAG systems enhance an LLM by incorporating proprietary data into the prompt context. Typically, RAG applications use text documents, but, what if the desired context includes multimedia like images, audio, and video? This course covers the technical aspects of implementing RAG with multimodal data to accomplish this.

Read more

Learn how to build multimodal search and RAG systems. RAG systems enhance an LLM by incorporating proprietary data into the prompt context. Typically, RAG applications use text documents, but, what if the desired context includes multimedia like images, audio, and video? This course covers the technical aspects of implementing RAG with multimodal data to accomplish this.

1. Learn how multimodal models are trained through contrastive learning and implement it on a real dataset.

2. Build any-to-any multimodal search to retrieve relevant context across different data types.

3. Learn how LLMs are trained to understand multimodal data through visual instruction tuning and use them on multiple image reasoning examples.

4. Implement an end-to-end multimodal RAG system that analyzes retrieved multimodal context to generate insightful answers.

5. Explore industry applications like visually analyzing invoices and flowcharts to output structured data.

6. Create a multi-vector recommender system that suggests relevant items by comparing their similarities across multiple modalities.

As AI systems increasingly need to process and reason over multiple data modalities, learning how to build such systems is an important skill for AI developers.

This course equips you with the key skills to embed, retrieve, and generate across different modalities. By gaining a strong foundation in multimodal AI, you’ll be prepared to build smarter search, RAG, and recommender systems.

Enroll now

Two deals to help you save

We found two deals and offers that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

What's inside

Syllabus

Building Multimodal Search and RAG
Learn how to build multimodal search and RAG systems. RAG systems enhance an LLM by incorporating proprietary data into the prompt context. Typically, RAG applications use text documents, but, what if the desired context includes multimedia like images, audio, and video? This course covers the technical aspects of implementing RAG with multimodal data to accomplish this. 1) Learn how multimodal models are trained through contrastive learning and implement it on a real dataset. 2) Build any-to-any multimodal search to retrieve relevant context across different data types. 3) Learn how LLMs are trained to understand multimodal data through visual instruction tuning and use them on multiple image reasoning examples 4) Implement an end-to-end multimodal RAG system that analyzes retrieved multimodal context to generate insightful answers. 5) Explore industry applications like visually analyzing invoices and flowcharts to output structured data. 6) Create a multi-vector recommender system that suggests relevant items by comparing their similarities across multiple modalities. As AI systems increasingly need to process and reason over multiple data modalities, learning how to build such systems is an important skill for AI developers. This course equips you with the key skills to embed, retrieve, and generate across different modalities. By gaining a strong foundation in multimodal AI, you’ll be prepared to build smarter search, RAG, and recommender systems.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops expertise and a strong foundation in building multimodal search, RAG systems, and recommender systems
Focuses on practical, immediately-applicable skills and uses real-world applications to illustrate concepts
Taught by Sebastian Witalec, a widely-recognized researcher in the field
Covers core topics like contrastive learning, representation learning, retrieval techniques, and system evaluation for multimodal systems
Provides a solid foundation for professionals working in or aspiring to work in AI system development, search technology, information retrieval, computer vision, natural language processing, and recommendation systems
Involves hands-on, practice-oriented learning through projects and assignments
May not be suitable for complete beginners in AI or deep learning

Save this course

Save Building Multimodal Search and RAG to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Building Multimodal Search and RAG with these activities:
Review foundational concepts in machine learning
Strengthen the understanding of machine learning algorithms and techniques used in multimodal AI.
Browse courses on Machine Learning
Show steps
  • Revisit key concepts from previous courses or books.
  • Solve practice problems and exercises.
Gather resources on multimodal AI applications
Expand knowledge and stay updated with the latest advancements in multimodal AI.
Show steps
  • Collect articles, research papers, and industry whitepapers.
  • Organize and categorize the resources for easy reference.
Explore open-source multimodal AI libraries and tools
Gain practical experience by working with industry-standard tools and resources.
Show steps
  • Identify and install relevant libraries.
  • Follow tutorials and documentation to understand their functionality.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Practice search and retrieval tasks
Develop proficiency in retrieving relevant context from multimodal data through repetitive practice.
Show steps
  • Define search queries and retrieval criteria.
  • Use provided codebase to implement search and retrieval algorithms.
  • Evaluate results and refine search strategies.
Discuss and share insights on multimodal AI
Engage with peers to exchange ideas and perspectives on the latest advancements in multimodal AI.
Show steps
  • Join online forums or discussion groups.
  • Attend virtual meetups or webinars.
Implement RAG with multimodal data
Enhance understanding of how RAG systems incorporate multimodal data into prompt context.
Show steps
  • Modify provided RAG codebase to handle multimodal inputs.
  • Test and evaluate the performance of multimodal RAG.
Build a multimodal recommender system
Apply multimodal AI concepts to create a practical and useful recommender system.
Show steps
  • Design the system architecture and data structures.
  • Implement the system using provided codebase.
  • Evaluate the system's performance and user experience.
Contribute to open-source multimodal AI projects
Gain practical experience and contribute to the development of multimodal AI technologies.
Show steps
  • Identify open-source projects aligned with interests.
  • Review code and documentation to understand the project.
  • Submit bug reports or feature requests.
  • Contribute code or documentation improvements.

Career center

Learners who complete Building Multimodal Search and RAG will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Building Multimodal Search and RAG.
Vector Databases: from Embeddings to Applications
Most relevant
Building Applications with Vector Databases
Most relevant
Knowledge Graphs for RAG
Most relevant
Building Agentic RAG with LlamaIndex
Most relevant
Haystack - Build customizable LLM pipelines with AI Tools
Most relevant
Multimodal Retrieval Augmented Generation (RAG) using the...
Most relevant
Gen AI - RAG Application Development using LlamaIndex
Most relevant
Gen AI - RAG Application Development using LangChain
Most relevant
Advanced Prompt Engineering for Everyone
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser