Welcome to "Testing AI: Foundation Models, LLMs, Chatbots & More," your comprehensive guide to understanding the fundamentals of testing advanced AI systems. Whether you're a developer, a data scientist, or simply an AI enthusiast, this course will equip you with the knowledge and skills necessary to assess and improve the reliability, performance, and safety of AI technologies.
What You Will Learn:
Welcome to "Testing AI: Foundation Models, LLMs, Chatbots & More," your comprehensive guide to understanding the fundamentals of testing advanced AI systems. Whether you're a developer, a data scientist, or simply an AI enthusiast, this course will equip you with the knowledge and skills necessary to assess and improve the reliability, performance, and safety of AI technologies.
What You Will Learn:
Introduction to AI Testing: Understand the importance of testing AI systems, including ethical considerations and the potential impacts of AI failures.
Testing Basics: Learn about different types of testing methodologies like unit testing, integration testing, and system testing as applied to AI.
Special Focus on Foundation Models and LLMs: Dive deep into the challenges and techniques for testing large language models and foundational AI systems that are reshaping numerous industries.
Chatbot Testing: Explore the unique aspects of testing conversational AI, ensuring they respond accurately and appropriately in varied scenarios.
AI System Evaluations: Learn how to design and implement effective testing regimes for different AI-based systems, using both manual and automated tools.
K - Folding of Data: Understand how to split all your data into : Training, Evaluation and testing data. Get more out of your data by splitting and training it on the same data, but with different validation subsets each time, ensuring that your model learns from multiple perspectives. This technique helps improve generalization, reduces overfitting, and provides a more reliable estimate of model performance.
Case Studies: Gain insights from real-world scenarios that highlight common pitfalls and best practices in AI testing.
Ethical AI: understand the risk with AI and the ethics behind AI. How can and should you test for this
Benchmarking: Understand how to test the AI against some common benchmarking models such as: BLUE, Hella
Testing ChatGPT / Chatbots with the help of an API and integration this into MLOPS chain.
Adversarial AI: understand how to test for robustness in AI Models
Who This Course Is For:
This course is ideal for anyone looking to gain a solid grounding in the techniques and practices essential for testing AI systems. Whether you’re starting a career in AI, looking to enhance your professional skills, or interested in the mechanisms behind AI system reliability, this course has valuable insights for you.
Course Features:
Engaging video lectures
Practical assignments and hands-on projects
Quizzes and exams to test your knowledge
Access to a community forum for discussion and collaboration
Lifetime access to course materials
Enroll now to start mastering the crucial skill of testing AI systems and ensure you’re prepared to contribute to the development of safe and reliable AI technologies.
https://code.visualstudio.com/download
https://www.python.org/downloads/
https://pip.pypa.io/en/stable/installation/ - PIP Landing page
https://bootstrap.pypa.io/get-pip.py- install script
Get a brief introduction on what are the main components of AI
How NLP actually makes the AI more human.
Understand what is machine learning and how algorithms make the core of AI
Understand the basics concepts around supervised Machine Learning
Gain basic understanding of Unsupervised ML and Clustering
In this lecture you will get a basic idea of how Reinforced Learning is working together with ML Algorithms
In this material you will understand how critical good quality training data actually is.
In this lecture we will put all the pieces together and explain what GEN AI actually is.
Understand what are the main areas in software testing
Link to code -> https://github.com/danteachqe/calculator/blob/master/src/test/LLM/perplex.py
Repo -> https://github.com/danteachqe/LLMs/tree/main/LLM/Data_Splitting
https://gluebenchmark.com
Understand the 7 benchmark frameworks such as: HellaSWAG, MMLU, CODEXGLUE, BLEU
Understand how to test for biases and falsehood with TruthfullQA
Link 1: https://arxiv.org/abs/2109.07958
Github Repo:https://github.com/sylinrl/TruthfulQA
Research Paper: https://arxiv.org/pdf/2210.09261
See in practice how to test for Falsehood with deepeval and TruthfullQA.
Link to deepeval:https://docs.confident-ai.com/docs/benchmarks-truthful-qa
Github repolink: https://github.com/danteachqe/calculator/tree/master/src/test/LLM
Understand how to test the understanding of a foundation model against the multi modal language understanding test (MMLU)
Github repo here: https://github.com/hendrycks/test
Research paper here: https://arxiv.org/pdf/2009.03300
Python scripts here: https://github.com/danteachqe/calculator/tree/master/src/test/LLM
See a way in which you can benchmark machine translated text with sacrebleu python library for the BLEU Benchmark
GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy
Paper Here- >https://arxiv.org/pdf/2311.12022
See a demo on benchmarking ChatGPT vs GPQA
Repo is here: https://github.com/idavidrein/gpqa/
And code example is here calculator/src/test/LLM at master · danteachqe/calculator (github.com)
https://arxiv.org/abs/2010.09670
https://arxiv.org/pdf/1905.07830
https://github.com/rowanz/hellaswag
Test Models against code generation benchmarks with humanEval framework:
Paper: https://arxiv.org/pdf/2107.03374
Python Code:Python scripts here: https://github.com/danteachqe/calculator/tree/master/src/test/LLM
This lecture will showcase the smoke testing approach for basic LLM Content generation.
Understand what is accuracy testing and some way to validate this criteria. Examples : BLEU, ROUGE, Perplexity, Quantitative Assessments and Domain Specific Assessments.
Understand how to test for repeatability in LLM with temperature control, seed input and coherent prompts
Validate that the model is using the most cost effective means of generating/ problem solving approaches
This assesses how the LLM's performance changes over time as it's exposed to new data or used in real-world scenarios. This helps identify potential degradation in accuracy or emergence of new biases over time.
Understand what are the differences between Pretrained Model and LLM with Augmented Retrieval
Lets see a small demo with Google Vertex AI - on how to manually train a model
Lets see a small demo with Google Vertex AI - on how to train a model with a bulk of json files
In this lecture you will learn the basics of adversial testing as well as the red and blue security teams.
Learn how prompt injection works by understanding, direct and indirect prompt injections techniques as well as some examples and how to defend against them.
Understand DOS attacks such as: Prompt flood, API Exploitation, Context Window Exploit and other
Understand what is poisoning attack, how to mitigate it and some examples.
Links to material reference:
https://www.businessinsider.com/tesla-hackers-steer-into-oncoming-traffic-with-stickers-on-the-road-2019-4
https://bair.berkeley.edu/blog/2019/08/13/memorization/
https://chatgpt.com/share/456d092b-fb4e-4979-bea1-76d8d904031f
https://www.researchgate.net/figure/Adversarial-examples-using-PGD-with-and-with-noise-constraint-of-on_fig1_350132115
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.