We may earn an affiliate commission when you visit our partners.
Course image
Erwin Huizenga

You’ll learn prompt engineering techniques to guide Gemini’s behavior and optimize its performance for diverse use cases, from creative story generation to analytical report writing. And you’ll discover how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.

What you’ll learn, in detail:

Read more

You’ll learn prompt engineering techniques to guide Gemini’s behavior and optimize its performance for diverse use cases, from creative story generation to analytical report writing. And you’ll discover how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.

What you’ll learn, in detail:

1. Introduction to Gemini Models: Explore the Gemini model family, and understand the key differences and use cases for Gemini Nano, Pro, Flash, and Ultra. Understand how to select optimal models based on capability, latency, and cost considerations.

2. Multimodal Prompting and Parameter Control: Learn advanced techniques for structuring effective text-image-video prompts to elicit desired model behavior. Fine-tune key parameters like temperature, top_p, top_k to control model creativity vs determinism.

3. Best Practices for Multimodal Prompting: Get experience with prompt engineering for Gemini multimodal models, and best practices around role assignment, task decomposition, and formatting. Analyze the impact of prompt-image ordering on model performance for different objectives.

4. Creating Use Cases with Images: Build engaging multimodal applications like interior design assistants and receipt itemization tools. Leverage Gemini’s cross-modal reasoning capabilities to analyze relationships between entities across multiple images.

5. Developing Use Cases with Videos: Implement “needle in the haystack” semantic video search powered by Gemini’s large context window. Explore techniques for long-form video QA and content summarization.

6. Integrating Real-Time Data with Function Calling: Extend Gemini with external knowledge and live data via function calling and API integration. Combine Gemini’s Natural Language Understanding (NLU) capabilities with APIs for up-to-date facts and interactive services.

Through this course, you’ll become well-versed in Gemini’s capabilities, how to maximize them in different use cases, and a portfolio of practical techniques for architecting advanced multimodal AI applications.

Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.

Enroll now

What's inside

Syllabus

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Activities

Coming soon We're preparing activities for Large Multimodal Model Prompting with Gemini. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Large Multimodal Model Prompting with Gemini will develop knowledge and skills that may be useful to these careers:
Prompt Engineer
A Prompt Engineer specializes in crafting effective textual and multimodal prompts to guide AI models, like Gemini, in achieving desired outputs and behaviors. This course provides the foundational expertise for this role by immersing you in prompt engineering techniques specifically for large multimodal models. You will learn to structure text, image, and video prompts, fine-tune parameters like temperature for creativity or determinism, and master best practices such as role assignment and task decomposition. This detailed understanding of how to optimize Gemini's performance for diverse use cases, from creative generation to analytical report writing and semantic search, makes this course an unparalleled preparation. It directly equips you to become proficient in eliciting precise and sophisticated responses from advanced AI systems.
Artificial Intelligence Application Developer
As an Artificial Intelligence Application Developer, you build intelligent systems and solutions that leverage advanced AI models. This course teaches you to construct engaging multimodal applications, directly aligning with this career path. You will gain practical experience creating systems like virtual interior designers, receipt itemization tools, and semantic video search engines, all powered by Gemini’s cross-modal reasoning. Crucially, you will learn to integrate Gemini with external APIs and databases using function calling, enabling your applications to interact with real-time data. This comprehensive approach to architecting and deploying multimodal AI solutions makes this course ideal for anyone aiming to develop practical, intelligent applications.
Machine Learning Engineer
A Machine Learning Engineer develops and deploys the intelligent systems that drive innovation. This course offers comprehensive insights into leveraging large multimodal models like Gemini, a critical skill for modern ML engineers. You will explore the Gemini model family, understanding their key differences and optimal use cases based on capability, latency, and cost. Learning advanced prompting techniques and parameter control for text, image, and video inputs will empower you to optimize model behavior and performance. The ability to integrate Gemini with real-time data through function calling further enhances your capabilities in building robust, dynamic, and sophisticated machine learning applications, preparing you for success in this evolving field.
Software Engineer (Artificial Intelligence)
A Software Engineer Artificial Intelligence develops and implements robust software solutions that integrate AI models and capabilities. This course provides direct pathways for success by teaching you how to architect advanced multimodal AI applications using Gemini. You will gain hands-on experience in building intelligent systems that seamlessly understand and reason across text, images, and videos. A critical skill for this role is learning to integrate Gemini with external APIs and databases through function calling, enabling your applications to draw on real-time data and dynamic content. This practical skillset is foundational for developing cutting-edge AI-powered software products and services.
Solutions Architect Artificial Intelligence
A Solutions Architect Artificial Intelligence designs and oversees the implementation of complex AI systems, identifying the right tools and strategies for various business challenges. This course provides a deep dive into the Gemini model family and its capabilities, enabling you to select optimal models considering capability, latency, and cost. You will learn to architect advanced multimodal AI applications, understanding how to unify traditionally siloed data modalities like text, images, and videos. Furthermore, mastering function calling for integrating external APIs and databases is essential for designing scalable, real-time AI solutions. This knowledge is invaluable for crafting robust and effective AI architectures.
Technical Consultant Artificial Intelligence
A Technical Consultant Artificial Intelligence advises organizations on leveraging AI technologies to achieve strategic objectives. This course is an excellent resource for understanding the practical applications and architectural considerations of large multimodal models like Gemini. You will gain expertise in selecting optimal Gemini models based on capability, latency, and cost, which is vital for recommending appropriate solutions. The course also equips you with the skills to demonstrate how to integrate Gemini with external APIs and databases using function calling, providing dynamic and real-time solutions. This comprehensive knowledge allows you to guide clients in architecting advanced, impactful multimodal AI applications for their specific needs.
Artificial Intelligence Product Manager
An Artificial Intelligence Product Manager defines the vision, strategy, and roadmap for AI-powered products. This course provides a strong foundation for this role by offering a detailed exploration of the Gemini model family. Understanding the key differences and use cases for Gemini Nano, Pro, Flash, and Ultra, alongside considerations for capability, latency, and cost, is essential for informed product decisions. The course illustrates diverse applications, such as virtual interior designers and smart document processing, fostering an appreciation for potential product innovations. This knowledge allows you to effectively guide the development of new classes of intelligent systems, ensuring product relevance and market success.
Machine Learning Model Evaluator
A Machine Learning Model Evaluator systematically assesses the performance, reliability, and biases of AI models. This course provides highly relevant skills, focusing on how to optimize Gemini's performance for diverse use cases. You will learn to fine-tune key parameters like temperature, top_p, and top_k to control model creativity versus determinism, which is crucial for evaluating model behavior under different conditions. The course also covers analyzing the impact of prompt-image ordering on model performance for different objectives. This detailed understanding of model control and output assessment directly prepares you for the rigorous demands of robust model evaluation in multimodal AI systems.
Computer Vision Engineer
A Computer Vision Engineer develops systems that enable computers to "see" and interpret visual information. This course directly enhances your skill set by focusing on multimodal prompting that involves images and videos. You will gain practical experience creating use cases that leverage Gemini’s cross-modal reasoning capabilities to analyze relationships between entities across multiple images, such as for interior design assistants or receipt itemization. Furthermore, developing use cases with videos, including semantic video search and long-form video question answering, will be instrumental. This specialized training allows you to integrate advanced multimodal AI into innovative computer vision applications.
Intelligent Automation Specialist
An Intelligent Automation Specialist designs and implements automated processes leveraging AI to enhance efficiency and productivity. This course may be useful by demonstrating how to build smart document processing pipelines that extract structured data and answer questions from complex PDFs, a key automation task. You will also learn to integrate Gemini with external APIs and databases using function calling, allowing for the automation of tasks that require real-time data and dynamic content. The ability to unify traditionally siloed data modalities for reasoning makes this course relevant for creating advanced, end-to-end intelligent automation solutions across various business functions.
Multimodal Content Creator
A Multimodal Content Creator leverages various media types to tell stories and engage audiences. This course may be helpful by demonstrating how large multimodal models like Gemini can assist with diverse content creation tasks. You will explore prompt engineering for creative story generation and analytical report writing, allowing you to produce text-based content efficiently. Furthermore, understanding Gemini's ability to analyze images and videos for insights, or to generate personalized design recommendations, can empower you to create more dynamic and interactive content. This course equips you with the skills to harness AI for innovative and engaging multimodal content development.
Natural Language Processing Engineer
A Natural Language Processing Engineer builds systems that understand, interpret, and generate human language. This course may be particularly helpful as it shows how to integrate Gemini’s Natural Language Understanding capabilities with other modalities and real-time data. You will learn to extract structured data from complex PDFs, answer questions based on content, and generate human-like summaries, all critical NLP tasks. The course also covers creative story generation and analytical report writing, demonstrating diverse text-based applications. By exploring multimodal prompting, you can develop more sophisticated NLP solutions that seamlessly understand and reason across text, images, and videos.
Data Analyst Artificial Intelligence
A Data Analyst Artificial Intelligence extracts meaningful insights from complex datasets to inform business decisions. This course may be useful by teaching you how large multimodal models like Gemini can analyze and reason across diverse data modalities. You will learn to extract structured data from complex PDFs, enabling automated data processing and analysis. The course also delves into understanding relationships between entities across multiple images and performing long-form video question answering, offering new avenues for data interpretation. This exposure to cross-modal reasoning helps build a foundation for analyzing data in richer, more intelligent ways for improved decision-making.
Research Scientist, Artificial Intelligence
A Research Scientist Artificial Intelligence explores novel AI concepts and advances the state of the art in machine learning. This course may be useful for those interested in the practical application aspects of large multimodal models, offering insights into Gemini's capabilities. You will learn how models unify traditionally siloed data modalities and reason across text, images, and videos, pushing the boundaries of what is possible. While typically requiring an advanced degree, this course provides a pragmatic understanding of prompt engineering techniques, parameter control, and architecting advanced applications, which can inform research directions and experimental design in multimodal AI.
User Experience Researcher Artificial Intelligence
A User Experience Researcher Artificial Intelligence investigates how users interact with AI systems to inform design and improve usability. This course may be helpful by providing a foundational understanding of how large multimodal models like Gemini function and respond to prompts. You will gain insight into how these models understand style preferences from text descriptions and analyze room images for personalized design recommendations, directly influencing user experience. Understanding multimodal prompting best practices and task decomposition will enable you to better analyze user interaction with AI-powered applications, leading to more intuitive and effective designs for intelligent systems.

Reading list

We haven't picked any books for this reading list yet.
Focuses on the use of prompt engineering for natural language processing. It is written by Thomas Wolf, a leading researcher in the field of NLP.
Focuses on the use of prompt engineering for recommendation systems. It is written by Masashi Sugiyama, a leading researcher in the field of recommendation systems.
Covers the use of prompt engineering for finance. It is written by Richard Roll, a leading researcher in the field of finance.
Focuses on the use of prompt engineering for education. It is written by Salman Khan, a leading researcher in the field of education.
Provides a comprehensive guide to prompt engineering, covering techniques for crafting effective inputs to generative AI models. It's particularly useful for understanding how to obtain reliable and predictable results, which is crucial for both beginners and those looking to deepen their practical skills. This book is valuable as a current reference for anyone working with generative AI.
Offers a practical, hands-on approach to prompt engineering specifically with ChatGPT. It's an excellent resource for high school and undergraduate students getting started, providing clear examples and exercises. It serves as a useful introductory guide and additional reading to complement foundational AI courses.
While not solely focused on prompt engineering, this book provides a strong foundation in understanding how LLMs work, which is essential for effective prompting. It's suitable for undergraduate and graduate students, offering technical insights into language understanding and generation. It serves as valuable background reading for those wanting to understand the underlying mechanisms of the models they are prompting. Expected publication in September 2024.
This guide aims to make prompt engineering accessible with a step-by-step approach. It is well-suited for beginners and those new to the field, including high school students and those in introductory undergraduate programs. It provides practical tips and is useful for gaining a broad understanding of how to formulate effective AI prompts.
Focuses on the creative aspects of prompt engineering and generating diverse language outputs. It's a good fit for students and professionals looking to go beyond basic prompting and explore more advanced techniques for creative content generation. It adds breadth by covering applications in areas like creative writing and podcasting.
Explores prompt engineering within the broader context of generative AI and touches upon ethical considerations. It's relevant for all levels, providing a balanced view of the technical aspects and the societal impact of generative AI. It's useful for gaining a broader understanding and considering the responsible use of AI.
Delves into the technical underpinnings of generative models, which are the foundation of systems like ChatGPT. While not strictly about prompting, understanding these models at a deeper level is invaluable for advanced prompt engineering. It's best suited for undergraduate and graduate students with a technical background. It provides essential background knowledge for those seeking to truly master prompt engineering.
This classic and widely-referenced textbook in the field of NLP. While it predates the latest advancements in LLMs and prompt engineering, it provides a foundational understanding of language processing, which is crucial for anyone serious about the field. It's highly recommended for undergraduate and graduate students as a comprehensive reference for core NLP concepts.
Widely used introduction to NLP using the NLTK library in Python. It's excellent for beginners and undergraduate students to gain practical skills in processing and analyzing text data, which fundamental prerequisite for prompt engineering. It serves as a hands-on guide for learning the basics of NLP.
A foundational text in statistical NLP, this book provides the theoretical background necessary for understanding many of the techniques used in modern LLMs. It's a valuable resource for graduate students and researchers looking to deepen their understanding of the statistical underpinnings of language models. It is more theoretical and serves as a strong reference for advanced learners.
Focuses on the practical aspects of building NLP systems. While prompt engineering specific technique, understanding the entire NLP pipeline is beneficial for professionals. It's a good reference for those looking to implement prompt engineering within larger NLP applications.
Covers the broader field of AI engineering with a focus on foundation models, which include the LLMs used in prompt engineering. It's relevant for professionals and graduate students interested in the engineering aspects of building AI applications. It provides context on how prompt engineering fits into larger AI systems.
A beginner-friendly guide to using ChatGPT. is ideal for high school students and those new to generative AI who want a straightforward introduction to interacting with models. It focuses on practical usage and good starting point before diving into more complex prompt engineering concepts.
A comprehensive guide aimed at making prompt engineering accessible to a wide audience. It covers various techniques and applications for leveraging AI language models effectively. is useful for high school students through working professionals seeking a thorough introduction and practical guidance.
This guide offers quick tips and tricks for prompt engineering across different generative AI models, including text and image generation models. It's a practical resource for users who want to improve their immediate results with various AI tools. It's suitable for all levels looking for actionable advice.
For those who want to understand the mechanics of LLMs deeply, this book guides you through building one from scratch. This is highly technical and suitable for advanced undergraduate students, graduate students, and researchers. A deep understanding of LLM architecture is beneficial for advanced prompt engineering techniques.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser