Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources.
Enhancing a RAG system’s performance depends on efficiently processing diverse unstructured data sources.
In this course, you’ll learn techniques for representing all sorts of unstructured data, like text, images, and tables, from many different sources and implement them to extend your LLM RAG pipeline to include Excel, Word, PowerPoint, PDF, and EPUB files.
1. How to preprocess data for your LLM application development, focusing on how to work with different document types.
2. How to extract and normalize various documents into a common JSON format and enrich it with metadata to improve search results.
3. Techniques for document image analysis, including layout detection and vision transformers, to extract and understand PDFs, images, and tables.
4. How to build a RAG bot that is able to ingest different documents like PDFs, PowerPoints, and Markdown files.
Apply the skills you’ll learn in this course to real-world scenarios, enhancing your RAG application and expanding its versatility.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.