Masterclass Testing Machine Learning (AI) Models from Udemy

What's inside

Learning objectives

Understand how ai is working
Understand basic software testing and how to apply it to artificial intelligence models
Understand how machine learning models are tested compared to traditional software
Understand the ethics behind artificial intelligence (ai) and how to validate biases in large language models ( llms)
Understand how to test reasoning abilities of ai
Gain knowledge of reasoning types and how to validate foundation ai models against different logic and reasoning types
Gain knowledge on what is natural language processing, and what tools can be used to test for it in a machine learning model
See how to benchmark ai against models "hellaswag, mmlu, glue, bleu, humaneval"

Importance of test data and how model drifting degrades large language model performance
See how chatbots can be tested with real chatgpt examples
Demo on testing chat gpt with automated tools
Understand adversial testing techniques - gain knowhow on how to perform attacks on artificial intelligence models
Understand the common and traditional metrcs what are used in machine learning field, such as f1 score, perplexity recall and accuracy
Understand k- folding data techniques
Show more
Show less

Understand how ai is working
Understand basic software testing and how to apply it to artificial intelligence models
Understand how machine learning models are tested compared to traditional software
Understand the ethics behind artificial intelligence (ai) and how to validate biases in large language models ( llms)
Understand how to test reasoning abilities of ai
Gain knowledge of reasoning types and how to validate foundation ai models against different logic and reasoning types
Gain knowledge on what is natural language processing, and what tools can be used to test for it in a machine learning model
See how to benchmark ai against models "hellaswag, mmlu, glue, bleu, humaneval"
Importance of test data and how model drifting degrades large language model performance
See how chatbots can be tested with real chatgpt examples
Demo on testing chat gpt with automated tools
Understand adversial testing techniques - gain knowhow on how to perform attacks on artificial intelligence models
Understand the common and traditional metrcs what are used in machine learning field, such as f1 score, perplexity recall and accuracy
Understand k- folding data techniques
Show more
Show less

Syllabus

Introduction

Introduction to Material

About your instructor

5 Minute Fast AI Testing Challenge

https://code.visualstudio.com/download

https://www.python.org/downloads/

https://pip.pypa.io/en/stable/installation/ - PIP Landing page

https://bootstrap.pypa.io/get-pip.py- install script

Get a brief introduction on what are the main components of AI

How NLP actually makes the AI more human.

Understand what is machine learning and how algorithms make the core of AI

Understand the basics concepts around supervised Machine Learning

Gain basic understanding of Unsupervised ML and Clustering

In this lecture you will get a basic idea of how Reinforced Learning is working together with ML Algorithms

In this material you will understand how critical good quality training data actually is.

In this lecture we will put all the pieces together and explain what GEN AI actually is.

Understand what are the main areas in software testing

Link to code -> https://github.com/danteachqe/calculator/blob/master/src/test/LLM/perplex.py

Repo -> https://github.com/danteachqe/LLMs/tree/main/LLM/Data_Splitting

https://gluebenchmark.com

Understand the 7 benchmark frameworks such as: HellaSWAG, MMLU, CODEXGLUE, BLEU

Understand how to test for biases and falsehood with TruthfullQA

Link 1: https://arxiv.org/abs/2109.07958

Github Repo:https://github.com/sylinrl/TruthfulQA

Research Paper: https://arxiv.org/pdf/2210.09261

See in practice how to test for Falsehood with deepeval and TruthfullQA.

Link to deepeval:https://docs.confident-ai.com/docs/benchmarks-truthful-qa

Github repolink: https://github.com/danteachqe/calculator/tree/master/src/test/LLM

Understand how to test the understanding of a foundation model against the multi modal language understanding test (MMLU)

Github repo here: https://github.com/hendrycks/test

Research paper here: https://arxiv.org/pdf/2009.03300

Python scripts here: https://github.com/danteachqe/calculator/tree/master/src/test/LLM

See a way in which you can benchmark machine translated text with sacrebleu python library for the BLEU Benchmark

GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy

Paper Here- >https://arxiv.org/pdf/2311.12022

See a demo on benchmarking ChatGPT vs GPQA

Repo is here: https://github.com/idavidrein/gpqa/

And code example is here calculator/src/test/LLM at master · danteachqe/calculator (github.com)

https://arxiv.org/abs/2010.09670

https://arxiv.org/pdf/1905.07830

https://github.com/rowanz/hellaswag

Test Models against code generation benchmarks with humanEval framework:

Paper: https://arxiv.org/pdf/2107.03374

Python Code:Python scripts here: https://github.com/danteachqe/calculator/tree/master/src/test/LLM

This lecture will showcase the smoke testing approach for basic LLM Content generation.

Understand what is accuracy testing and some way to validate this criteria. Examples : BLEU, ROUGE, Perplexity, Quantitative Assessments and Domain Specific Assessments.

Understand how to test for repeatability in LLM with temperature control, seed input and coherent prompts

Validate that the model is using the most cost effective means of generating/ problem solving approaches

This assesses how the LLM's performance changes over time as it's exposed to new data or used in real-world scenarios. This helps identify potential degradation in accuracy or emergence of new biases over time.

Understand what are the differences between Pretrained Model and LLM with Augmented Retrieval

Lets see a small demo with Google Vertex AI - on how to manually train a model

Lets see a small demo with Google Vertex AI - on how to train a model with a bulk of json files

In this lecture you will learn the basics of adversial testing as well as the red and blue security teams.

Learn how prompt injection works by understanding, direct and indirect prompt injections techniques as well as some examples and how to defend against them.

Understand DOS attacks such as: Prompt flood, API Exploitation, Context Window Exploit and other

Understand what is poisoning attack, how to mitigate it and some examples.

Links to material reference:

https://www.businessinsider.com/tesla-hackers-steer-into-oncoming-traffic-with-stickers-on-the-road-2019-4

https://bair.berkeley.edu/blog/2019/08/13/memorization/

https://chatgpt.com/share/456d092b-fb4e-4979-bea1-76d8d904031f
https://www.researchgate.net/figure/Adversarial-examples-using-PGD-with-and-with-noise-constraint-of-on_fig1_350132115

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Explores ethical considerations and potential impacts of AI failures, which is crucial for responsible AI development and aligns with current industry trends

Covers testing methodologies like unit, integration, and system testing as applied to AI, providing a solid foundation for ensuring AI system reliability

Includes practical assignments and hands-on projects, which allows learners to apply testing techniques in real-world scenarios and build a portfolio

Requires installing modules from potentially untrusted sources, which may pose a security risk if not handled carefully and could be a barrier for some learners

Examines adversarial AI and techniques for robustness testing, which is increasingly important for securing AI models against malicious attacks and ensuring their reliability

Discusses benchmarks like BLUE and HellaSWAG, which are standard tools for evaluating AI model performance and comparing them against state-of-the-art models

Reviews summary

Practical testing for ai and llm models

According to learners, this course offers a practical approach to testing modern AI systems, especially Large Language Models. Students appreciate the hands-on demos and the coverage of various benchmarking frameworks like GLUE, BLEU, and HellaSwag. Many find the course helpful for understanding how to apply traditional software testing concepts to the unique challenges of AI. While the course provides a solid foundation, some reviews suggest that parts could benefit from more depth or clarification on complex topics. The focus is strongly on real-world testing scenarios, making it relevant for professionals.

Provides a good overview of testing AI systems.

"This course gave me a solid foundation in how testing differs for ML/AI compared to traditional software."

"It helped bridge the gap between my software testing knowledge and the specifics of testing neural networks and LLMs."

"I now have a better understanding of the types of tests needed for foundation models."

Explores important AI model benchmarks.

"Learning about benchmarks like GLUE, BLEU, and HellaSwag was very useful for understanding how models are evaluated."

"The course covered a good range of benchmarks relevant to LLMs and NLP, which I hadn't seen covered in other courses."

"I found the sections on MMLU and TruthfulQA particularly insightful for testing bias and reasoning."

Emphasizes practical labs and real-world examples.

"The hands-on coding and projects are the strongest part of the course for me, especially the demos with ChatGPT and Gemini."

"The practical assignments reinforced the concepts well. I learned how to actually run these tests myself."

"I really appreciated the demos showing how to benchmark models using Python scripts provided in the repository."

Some reviewers noted issues with provided code.

"Had some difficulty getting the provided code examples to run initially; required troubleshooting."

"The GitHub repository was helpful, but some scripts needed slight modifications to work in my environment."

"Accessing and setting up the code environment took longer than expected based on the instructions."

Some areas could use more detailed explanations.

"Could use more in-depth coverage on complex topics or optimization techniques for testing at scale."

"Some explanations felt a bit rushed, and I had to do external research to fully grasp certain concepts."

"While the breadth is good, I wished for more deep dives into specific advanced testing methodologies."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Masterclass Testing Machine Learning (AI) Models with these activities:

Review Machine Learning Fundamentals

Show steps

Strengthen your understanding of core machine learning concepts to better grasp the nuances of testing AI models.

Browse courses on Machine Learning

Show steps

Review key concepts like supervised and unsupervised learning.
Practice basic machine learning algorithms.
Familiarize yourself with common evaluation metrics.

Read 'Testing Machine Learning Applications'

Show steps

Gain a deeper understanding of machine learning testing methodologies and best practices.

View Alter Ego: A Novel on Amazon

Show steps

Obtain a copy of 'Testing Machine Learning Applications'.
Read the chapters relevant to model evaluation and benchmarking.
Take notes on key testing techniques and tools.

Implement K-Fold Cross-Validation

Show steps

Solidify your understanding of K-Fold cross-validation by implementing it in Python.

Show steps

Write a Python script to split a dataset into K folds.
Train a machine learning model on K-1 folds.
Evaluate the model on the remaining fold.
Repeat the process K times and average the results.

Four other activities

Expand to see all activities and additional details

Show all seven activities

Read 'Adversarial Machine Learning'

Show steps

Understand adversarial testing techniques and how to defend against attacks on AI models.

View Alter Ego: A Novel on Amazon

Show steps

Obtain a copy of 'Adversarial Machine Learning'.
Read the chapters on attack methods and defense mechanisms.
Experiment with implementing adversarial attacks and defenses.

Blog Post: Ethical Considerations in AI Testing

Show steps

Deepen your understanding of ethical AI by writing a blog post on the topic.

Show steps

Research ethical considerations in AI testing.
Outline the key points for your blog post.
Write a compelling and informative blog post.
Publish your blog post on a platform like Medium.

Build a Chatbot Testing Framework

Show steps

Apply your knowledge by building a framework for testing chatbots.

Show steps

Design the architecture of your testing framework.
Implement the core components of the framework.
Integrate with a chatbot API like ChatGPT.
Test your framework with various chatbot scenarios.

Presentation: Benchmarking LLMs

Show steps

Consolidate your knowledge of LLM benchmarking by creating a presentation.

Show steps

Research different LLM benchmarking frameworks.
Select a few key benchmarks to focus on.
Prepare slides summarizing the benchmarks and their results.
Practice delivering your presentation.

Career center

Learners who complete Masterclass Testing Machine Learning (AI) Models will develop knowledge and skills that may be useful to these careers:

AI Validation Engineer

An AI Validation Engineer ensures the reliability and performance of AI systems. The role involves designing and implementing testing strategies, evaluating model accuracy, and identifying potential biases. This course is directly applicable, since it helps build a foundation in testing methodologies, ethical considerations, and performance metrics relevant to AI systems. Specifically, the 'Testing AI' course arms Validation Engineers with skills in areas like evaluating machine learning models based on data split, K-fold cross validation, and understanding model overfitting and underfitting. These skills are vital for thoroughly validating AI models before deployment.

See salaries and explore the career path for AI Validation Engineer

Machine Learning Quality Assurance Engineer

A Machine Learning Quality Assurance Engineer concentrates on ensuring the quality of machine learning models throughout their lifecycle. The responsibilities include creating test plans, executing tests, and reporting defects. This course is highly beneficial because it provides a detailed understanding of different testing methodologies, like unit testing, integration testing, and system testing—all of which are essential for this role. The 'Testing AI' course addresses challenges of testing large language models and foundation AI systems. This is particularly relevant for a Machine Learning Quality Assurance Engineer aiming to improve the reliability and accuracy of AI technologies.

See salaries and explore the career path for Machine Learning Quality Assurance Engineer

AI Test Automation Engineer

An AI Test Automation Engineer develops and implements automated testing frameworks for AI applications. The work includes designing test scripts, integrating testing tools, and analyzing test results to ensure continuous quality improvement. This course provides valuable insights into how to design and implement effective testing regimes for different AI based systems using both manual and automated tools. By covering topics like testing ChatGPT and chatbots with the help of an API and integrating this into Machine Learning Operations chain, the 'Testing AI' course is suited for anyone aiming to automate the testing of advanced AI systems.

See salaries and explore the career path for AI Test Automation Engineer

AI Safety Engineer

An AI Safety Engineer focuses on ensuring that AI systems operate safely and ethically, minimizing potential risks and unintended consequences. The role requires a deep understanding of AI ethics, risk assessment, and safety protocols. This course offers a sound introduction to ethical considerations in AI, including how to validate biases in large language models. The coverage of adversarial AI and techniques to test for robustness in AI models is particularly relevant to any AI Safety Engineer. The 'Testing AI' course helps build knowledge on the potential impacts of AI failures, and the importance of testing AI systems.

See salaries and explore the career path for AI Safety Engineer

Prompt Engineer

A Prompt Engineer designs and optimizes prompts for large language models to elicit desired responses and improve model performance. The daily responsibilities of this role include crafting effective prompts, evaluating model outputs, and refining prompting techniques. A Prompt Engineer may find the 'Testing AI' course helpful, as it covers real-world scenarios and best practices in AI testing, including how to benchmark AI against existing models. The course also gives an understanding of how to test for falsehood with tools and techniques such as TruthfulQA, which is crucial when evaluating the efficacy and safety of prompts.

See salaries and explore the career path for Prompt Engineer

AI Research Scientist

An AI Research Scientist conducts research to advance the field of artificial intelligence. The work involves developing new algorithms, experimenting with different models, and publishing research papers. While this role typically requires a doctoral degree, this course helps by providing a practical understanding of AI testing methodologies and benchmarking. AI Research Scientists learn how to validate the performance, reliability, and safety of AI technologies. Furthermore, the 'Testing AI' course will be helpful in their research efforts by providing hands-on knowledge of testing large language models and foundational AI systems.

See salaries and explore the career path for AI Research Scientist

Data Scientist

A Data Scientist analyzes large datasets to extract meaningful insights, develop predictive models, and support data-driven decision-making. The daily tasks include data cleaning, feature engineering, and model evaluation. The 'Testing AI' course may be useful to Data Scientists by covering the importance of test data, how model drifting degrades large language model performance, and K-folding data techniques. These lessons are crucial for Data Scientists aiming to ensure the robustness and reliability of their models. This Machine Learning course may also provide them with knowledge of software testing that is applicable to artificial intelligence models.

See salaries and explore the career path for Data Scientist

Machine Learning Engineer

A Machine Learning Engineer focuses on building, deploying, and maintaining machine learning models in production environments. This includes tasks such as model training, optimization, and integration with existing systems. Machine Learning Engineers can use this course to sharpen their skills in testing, ensuring that models perform reliably and meet required standards. This course focuses on testing methodologies, ethical considerations, and performance metrics, which are critical for Machine Learning Engineers involved in deploying AI solutions. The 'Testing AI' course can help them understand how to fine-tune and test models effectively.

See salaries and explore the career path for Machine Learning Engineer

AI Consultant

An AI Consultant advises organizations on how to leverage artificial intelligence to solve business problems and improve efficiency. The responsibilities include assessing AI readiness, recommending AI solutions, and guiding implementation efforts. An AI Consultant may find this 'Testing AI' course may be helpful because it provides a broad understanding of AI technologies, testing methodologies, and ethical considerations. This knowledge is valuable for consultants aiming to deliver informed and reliable advice to their clients. The course focuses on testing large language models and foundational AI systems, relevant topics for AI Consultants.

See salaries and explore the career path for AI Consultant

AI Product Manager

An AI Product Manager oversees the development and launch of AI-powered products. The role involves defining product vision, prioritizing features, and coordinating cross-functional teams. This course may be useful to AI Product Managers by providing a foundational understanding of AI testing, ethical considerations, and performance metrics. The 'Testing AI' course will help AI Product Managers make informed decisions about product development, ensuring they align with industry standards and user needs. By understanding AI testing, an AI Product Manager can ensure that the AI products they are managing are reliable.

See salaries and explore the career path for AI Product Manager

Software Developer

A Software Developer designs, develops, and maintains software applications. Their routine work includes writing code, debugging, and testing software functionality. Software Developers seeking to enhance their skills in AI may find this course useful. This 'Testing AI' course helps build a solid grounding in the techniques and practices essential for testing AI systems. The material here will enhance their understanding of how to ensure software performs reliably and meets industry standards, particularly when integrating AI components.

See salaries and explore the career path for Software Developer

Data Analyst

A Data Analyst collects, processes, and analyzes data to provide insights that inform business decisions. This includes creating reports, visualizing data, and identifying trends. A Data Analyst interested in expanding their knowledge in AI may find this course useful. The 'Testing AI' course may help build a solid grounding in the techniques and practices essential for testing AI systems. The material here may enhance their understanding of how to ensure data-driven models and systems are reliable and effective.

See salaries and explore the career path for Data Analyst

Business Intelligence Analyst

A Business Intelligence Analyst uses data to analyze market trends, customer behavior, and competitive landscapes, providing insights to improve business strategies. This role involves data modeling, report generation, and performance tracking. A Business Intelligence Analyst seeking to understand the AI technologies that are increasingly integrated into business processes may find this course may be useful. The 'Testing AI' course provides an overview of AI, testing methodologies, and ethical considerations, enhancing the analyst's ability to assess and interpret AI-driven insights.

See salaries and explore the career path for Business Intelligence Analyst

Technical Writer

A Technical Writer creates documentation for software, hardware, and other technical products. They translate complex information into easily understandable guides, manuals, and articles. A Technical Writer aiming to specialize in AI may find this course may be useful. By completing the 'Testing AI' course, a Technical Writer can gain an understanding of AI technologies, testing methodologies, and ethical considerations. This knowledge will provide them with a foundation to create accurate and accessible documentation for AI products and systems.

See salaries and explore the career path for Technical Writer

Project Manager

A Project Manager plans, executes, and closes projects, ensuring they are completed on time, within budget, and to the required quality standards. Responsibilities include defining project scope, allocating resources, and managing risks. A Project Manager who wants to improve their understanding of AI may find this course may be useful. The 'Testing AI' course offers a broad overview of AI technologies, allowing them to manage AI projects effectively. This will help them navigate the unique challenges associated with AI projects.

See salaries and explore the career path for Project Manager

Reading list

We've selected one books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Masterclass Testing Machine Learning (AI) Models.

Alter Ego

Save

Provides a practical guide to testing machine learning applications, covering various testing methodologies and tools. It offers valuable insights into ensuring the reliability and performance of AI systems. This book is particularly useful for understanding the challenges and best practices in testing AI models, which aligns directly with the course objectives. It serves as a useful reference tool for the course.

Alter Ego: A Novel

Paperback

Masterclass Testing Machine Learning (AI) Models

What's inside

Learning objectives

Syllabus

Traffic lights

Save this course

Reviews summary

Practical testing for ai and llm models

Activities

Career center

Reading list

Share

Similar courses