We may earn an affiliate commission when you visit our partners.
Course image
Kasun Liyanage

This course is all about CUDA programming. We will start our discussion by looking at basic concepts including CUDA programming model, execution model, and memory model. Then we will show you how to implement advance algorithms using CUDA. CUDA programming is all about performance. So through out this course you will learn multiple optimization techniques and how to use those to implement algorithms. Also we will extensively discuss profiling techniques and some of the tools including nvprof, nvvp, CUDA Memcheck, CUDA-GDB tools in the CUDA toolkit. This course contains following sections.

Read more

This course is all about CUDA programming. We will start our discussion by looking at basic concepts including CUDA programming model, execution model, and memory model. Then we will show you how to implement advance algorithms using CUDA. CUDA programming is all about performance. So through out this course you will learn multiple optimization techniques and how to use those to implement algorithms. Also we will extensively discuss profiling techniques and some of the tools including nvprof, nvvp, CUDA Memcheck, CUDA-GDB tools in the CUDA toolkit. This course contains following sections.

Answering all those will help you to digest the concepts we discuss here.

This course is the first course of the CUDA master class series we are current working on. So the knowledge you gain here is essential of following those course as well.

Enroll now

Here's a deal for you

We found an offer that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

What's inside

Learning objectives

  • All the basic knowladge about cuda programming
  • Ability to desing and implement optimized parallel algorithms
  • Basic work flow of parallel algorithm design
  • Advance cuda concepts

Syllabus

Introduction to CUDA programming and CUDA programming model
Very very important
Introduction to parallel programming
Parallel computing and Super computing
Read more
Let's investigate some background.
How to install CUDA toolkit and first look at CUDA program
Basic elements of CUDA program
Organization of threads in a CUDA program - threadIdx
Organization of thread in a CUDA program - blockIdx,blockDim,gridDim
Programming exercise 1
Unique index calculation using threadIdx blockId and blockDim
Unique index calculation for 2D grid 1
Unique index calculation for 2D grid 2
Memory transfer between host and device
Programming exercise 2
Sum array example with validity check
Sum array example with error handling
Sum array example with timing
Extend sum array implementation to sum up 3 arrays
Device properties
Summary
CUDA Execution model
Understand the device better
All about warps
Warp divergence
Resource partitioning and latency hiding 1
Resource partitioning and latency hiding 2
Occupancy
Profile driven optimization with nvprof
Parallel reduction as synchronization example
Parallel reduction as warp divergence example
Parallel reduction with loop unrolling
Parallel reduction as warp unrolling
Reduction with complete unrolling
Performance comparison of reduction kernels
CUDA Dynamic parallelism
Reduction with dynamic parallelism
CUDA memory model
Different memory types in CUDA
Memory management and pinned memory
Zero copy memory
Unified memory
Global memory access patterns
Global memory writes
AOS vs SOA
Matrix transpose
Matrix transpose with unrolling
Matrix transpose with diagonal coordinate system
CUDA Shared memory and constant memory
Introduction to CUDA shared memory
Shared memory access modes and memory banks
Row major and Column major access to shared memory
Static and Dynamic shared memory
Shared memory padding
Parallel reduction with shared memory
Synchronization in CUDA
Matrix transpose with shared memory
CUDA constant memory
Matrix transpose with Shared memory padding
CUDA warp shuffle instructions
Parallel reduction with warp shuffle instructions
CUDA Streams
Introduction to CUDA streams and events
How to use CUDA asynchronous functions
How to use CUDA streams
Overlapping memory transfer and kernel execution
Stream synchronization and blocking behavious of NULL stream
Explicit and implicit synchronization
CUDA events and timing with CUDA events
Creating inter stream dependencies with events
Performance Tuning with CUDA instruction level primitives
Introduction to different types of instructions in CUDA
Floating point operations
Standard and Instrict functions
Atomic functions
Parallel Patterns and Applications
Scan algorithm introduction
Simple parallel scan
Work efficient parallel exclusive scan
Work efficient parallel inclusive scan
Parallel scan for large data sets
Parallel Compact algorithm
Bonus: Introduction to Image processing with CUDA
Introduction part 1
Introduction part 2
Digital image processing
Digital image fundametals : Human perception
Digital image fundamentals : Image formation
OpenCV installation

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops strong programming skills, which are necessary for careers in software and engineering
Teaches advanced algorithms in CUDA, which helps learners become stronger programmers
Taught by Kasun Liyanage, who is recognized for their work in CUDA
Covers advanced concepts like resource partitioning, latency hiding, and occupancy
Examines performance tuning with nvprof, helping learners understand how to optimize their code
Bonus content on image processing with CUDA, adding value to the course

Save this course

Save CUDA programming Masterclass with C++ to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in CUDA programming Masterclass with C++ with these activities:
Organize and Review Course Materials
Maximize your learning potential by organizing and reviewing essential course materials regularly.
Show steps
  • Gather lecture notes, slides, assignments, and quizzes.
  • Create a structured system for organizing and storing these materials.
  • Review your organized materials periodically to reinforce your understanding.
Review Parallel Computing Concepts
Strengthen your foundation in parallel computing concepts to enhance your CUDA programming skills.
Browse courses on Parallel Computing
Show steps
  • Review textbooks or online resources on parallel computing principles.
  • Practice writing simple parallel programs in C++ or Python.
Read 'CUDA by Example'
Gain a comprehensive understanding of CUDA programming concepts and techniques through this foundational book.
View Cuda by Example on Amazon
Show steps
  • Read and understand the concepts presented in each chapter.
  • Work through the code examples and exercises provided in the book.
Three other activities
Expand to see all activities and additional details
Show all six activities
Practice Parallel Programming in C++ using CUDA
Sharpen your skills in parallel programming with CUDA by engaging in regular practice drills.
Browse courses on Parallel Programming
Show steps
  • Implement basic CUDA kernels for vector addition, matrix multiplication, and reduction operations.
  • Optimize CUDA kernels for improved performance.
  • Troubleshoot common CUDA errors and performance issues.
Explore Advanced Techniques in CUDA Programming
Delve deeper into advanced CUDA techniques to enhance your understanding and proficiency.
Show steps
  • Follow tutorials on topics such as shared memory optimization, warp synchronization, and atomic operations.
  • Experiment with different CUDA programming models and their applications.
  • Implement advanced algorithms like radix sort and prefix sum using CUDA.
Develop a CUDA-Based Project
Solidify your understanding by applying your CUDA skills to a real-world project.
Show steps
  • Identify a problem or application that can benefit from parallel processing.
  • Design and implement a CUDA solution for the problem.
  • Optimize and evaluate the performance of your CUDA implementation.
  • Present your project findings and share your code with the community.

Career center

Learners who complete CUDA programming Masterclass with C++ will develop knowledge and skills that may be useful to these careers:
High-Performance Computing Architect
High Performance Computing Architects design and build high-performance computing systems, including hardware, software, and networks. They work to optimize system performance, scalability, and efficiency to meet the demands of data-intensive applications. This course in CUDA programming can be highly beneficial for High Performance Computing Architects who want to develop high-performance computing solutions. By understanding CUDA programming, Architects can create more efficient and effective systems for various applications, including scientific research, engineering simulations, and financial modeling.
Computational Scientist
Computational Scientists use advanced computational techniques to solve complex scientific problems. They develop and apply mathematical models and algorithms to simulate and analyze physical, biological, and social systems. This course in CUDA programming can be beneficial for Computational Scientists who want to develop high-performance computing applications for scientific research. By leveraging CUDA, Computational Scientists can simulate and analyze complex systems more efficiently, enabling them to make groundbreaking discoveries and advance scientific knowledge.
Computer Programmer
Computer Programmers write, test, and maintain the code that makes computers and software work. They translate designs and specifications into instructions that computers can execute. This course in CUDA programming can help Computer Programmers develop the skills needed to create high-performance computing applications, which are in high demand in various industries.
Software Engineer
Software Engineers create the applications, programs, and software that run on computers, mobile phones, and other digital devices. They design, develop, test, and maintain these products, ensuring they meet user needs and function properly. This course helps build a foundation in CUDA programming, which is essential for developing high-performance computing applications. By understanding CUDA programming, Software Engineers can create more efficient and effective software solutions.
Deep Learning Engineer
Deep Learning Engineers specialize in developing and deploying deep learning models, a type of machine learning that uses artificial neural networks. They design, implement, and optimize deep learning algorithms to solve complex problems in various domains. This course in CUDA programming can be beneficial for Deep Learning Engineers who want to develop high-performance deep learning applications. By leveraging CUDA, Deep Learning Engineers can train and deploy models more efficiently, enabling them to push the boundaries of deep learning and drive innovation.
Machine Learning Engineer
Machine Learning Engineers design, develop, and deploy machine learning models to solve real-world problems. They use various techniques, including data analysis, statistical modeling, and optimization, to create models that can learn from data and make predictions. This course in CUDA programming can help Machine Learning Engineers develop the skills needed to create high-performance computing applications for machine learning. By leveraging CUDA, Machine Learning Engineers can train and deploy models more efficiently, enabling them to solve complex problems and drive innovation.
Data Scientist
Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. They develop and build models to analyze data and make predictions. This course in CUDA programming can be helpful for Data Scientists who want to develop high-performance computing applications for data analysis and modeling. By leveraging CUDA, Data Scientists can process large datasets more efficiently and uncover valuable insights.
Computer Hardware Engineer
Computer Hardware Engineers design, develop, and test computer hardware components, including processors, memory, and storage devices. They work to improve the performance, efficiency, and reliability of computer systems. This course in CUDA programming can be helpful for Computer Hardware Engineers who want to develop high-performance computing hardware. By understanding CUDA programming, Engineers can design and optimize hardware components that support high-performance computing applications.
Software Architect
Software Architects design, build, and maintain software systems. They work with stakeholders to understand business requirements and translate them into technical solutions. This course in CUDA programming can be helpful for Software Architects who want to develop high-performance computing software systems. By understanding CUDA programming, Architects can design and implement systems that can handle complex and data-intensive tasks efficiently.
Technical Lead
Technical Leads lead and manage teams of software engineers and computer programmers. They provide technical guidance, set project goals, and ensure that projects are completed on time and within budget. This course in CUDA programming can be helpful for Technical Leads who want to develop high-performance computing solutions. By understanding CUDA programming, Technical Leads can provide better guidance to their teams and ensure that projects are implemented efficiently.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical techniques to analyze financial data and make investment decisions. They develop and implement quantitative models to identify trading opportunities and manage risk. This course in CUDA programming may be helpful for Quantitative Analysts who want to develop high-performance computing applications for quantitative modeling and analysis. By leveraging CUDA, Quantitative Analysts can process large financial datasets more efficiently, enabling them to make more informed investment decisions.
Data Analyst
Data Analysts collect, clean, and analyze data to extract meaningful insights. They use data analysis techniques and tools to identify trends, patterns, and relationships in data. This course in CUDA programming may be helpful for Data Analysts who want to develop high-performance computing applications for data analysis. By leveraging CUDA, Data Analysts can process large datasets more efficiently, enabling them to uncover valuable insights and drive data-driven decision-making.
Financial Analyst
Financial Analysts provide insights and recommendations on financial matters, such as investments, stocks, and bonds. They use financial data and models to evaluate the performance of companies and make investment decisions. This course in CUDA programming may be helpful for Financial Analysts who want to develop high-performance computing applications for financial modeling and analysis. By leveraging CUDA, Financial Analysts can process large financial datasets more efficiently, enabling them to make more informed investment decisions.
Business Intelligence Analyst
Business Intelligence Analysts use data analysis and visualization techniques to provide insights and recommendations to businesses. They help businesses understand their data, make informed decisions, and improve their operations. This course in CUDA programming may be helpful for Business Intelligence Analysts who want to develop high-performance computing applications for business intelligence. By leveraging CUDA, Business Intelligence Analysts can process large datasets more efficiently, enabling them to provide more valuable insights and drive business growth.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in CUDA programming Masterclass with C++.
Provides a comprehensive overview of CUDA programming and is written by Shane Cook, one of the leading experts on CUDA programming. It covers a wide range of topics, from basic concepts to advanced techniques. It valuable resource for anyone who wants to learn about CUDA programming.
Provides a comprehensive overview of CUDA programming with C and C++. It covers a wide range of topics, from basic concepts to advanced techniques. It valuable resource for anyone who wants to learn about CUDA programming with C and C++.
Provides a hands-on introduction to CUDA programming. It covers a wide range of topics, from basic concepts to advanced techniques. It valuable resource for anyone who wants to learn about CUDA programming.
Focuses on the low-level details of CUDA programming with C. It practical book that provides a hands-on introduction to CUDA programming. It includes code samples and exercises for readers to follow along.
Provides a practical introduction to CUDA programming. It covers a wide range of topics, from basic concepts to advanced techniques. It includes code samples and exercises for readers to follow along and provides a good balance of theory and practice, making it a good choice for those who want to learn CUDA programming.
Provides a comprehensive overview of parallel and distributed computation. It covers a wide range of topics, from basic concepts to advanced techniques. It valuable resource for anyone who wants to learn about parallel and distributed computation or use it to power their own applications.
Provides a comprehensive overview of parallel programming. It covers a wide range of topics, from basic concepts to advanced techniques. It valuable resource for anyone who wants to learn about parallel programming.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to CUDA programming Masterclass with C++.
Modern C++ Concurrency in Depth ( C++17/20)
Most relevant
Data Structures and Algorithms: In-Depth using Python
Most relevant
Decision Making and Reinforcement Learning
201: Elementary Data Structur
Advanced Bayesian Statistics Using R
Web Automation-Selenium-Ruby|E-2-E Cucumber integration...
Intro to Parallel Programming
Data Structures and Algorithms in Python
Understanding the World Through Data
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser