This course is an introduction to speaker recognition techniques.
Speaker recognition lies in the intersection of audio processing, biometrics, and machine learning, and has various applications. You can find the application of speaker recognition on your smart phones, smart home devices, and various commercial services.
In this course, we will start with an introduction to the history of speaker recognition techniques, to see how it evolved from simple human efforts to modern deep learning based intelligent systems.
This course is an introduction to speaker recognition techniques.
Speaker recognition lies in the intersection of audio processing, biometrics, and machine learning, and has various applications. You can find the application of speaker recognition on your smart phones, smart home devices, and various commercial services.
In this course, we will start with an introduction to the history of speaker recognition techniques, to see how it evolved from simple human efforts to modern deep learning based intelligent systems.
We will cover the basics of acoustics, perception, audio processing, signal processing, and feature extraction, so you don't need a background in these domains. We will also have an introduction of popular machine learning approaches, such as Gaussian mixture models, support vector machines, factor analysis, and neural networks.
We will focus on how to build speaker recognition systems based on acoustic features and machine learning models, with an emphasis on modern speaker recognition with deep learning, such as the different options for inference logic, loss function, and neural network topologies.
We will also talk about data processing techniques such as data cleansing, data augmentation, and data fusion.
We included lots of hands-on practices and coding examples for you to really master the topics introduced in this course, and a final project to guide you through building your own speaker recognition system from scratch.
If you are a college student interested in AI or signal processing, or a software engineer, system architect or product manager working with related technologies, then this course is definitely for you.
Hello fellow scholars, my name is Quan Wang. I'm currently a Staff Software Engineer at Google, leading the "Speaker, Voice and Language" team. I was also a former machine learning scientist of the Amazon Alexa team.
I will be your instructor of this course, to share my knowledge and experience about speaker recognition techniques with you, and help you get prepared for your academic and career goals.
I have been working for more than 8 years in the voice identity industry.
At Google, my team and I had been developing lots of successful products. We filed lots of valuable patents, and published many impactful papers at top conferences. We frequently hit the headlines in tech news. I published a textbook about voice identity techniques, which became one of the bestselling books about AI in China. This book also won me the Distinguished Author of Year 2020 Award. So, many people consider me as kind of a "successful" scientist.
However, when I look back how I started this journey, it wasn't quite so pleasant at the very beginning.
Most of my undergraduate and Ph.D. research work had been focusing on computer vision and image processing. When I first started working on speech and speaker recognition at Amazon, I was under huge pressure. This pressure was not from my manager or anyone else. It was from myself, by realizing my knowledge and expertise really does not match what is required in my projects.
Every time I have meetings with different teams, or review people's code, documents, I don't know what people are talking about. I've even never heard of the terminologies and the acronyms, while people naturally assume you understand them. I was just feeling that I was the dullest person in the company. That experience was really terrible.
And that was the time when I really wished someone could just teach me the basic concepts in audio processing, speech, and speaker recognition. I searched the internet, but unfortunately, there were no such online courses.
I bought lots of books, and read lots of papers, online articles and tech blogs. However, there was really nothing that systematically introduces speaker recognition. Everything I could find was just fragmented information. Besides, most of the papers were very obscure, too difficult to follow for someone new in the field. Many online articles or blogs were unprofessional, even with obvious mistakes. And most technical books were already outdated when they were published.
And that is the reason why I decided to spend several years developing this course, to help anyone interested in speaker recognition techniques, to easily start working in this domain in the most frictionless way, and avoid all the frustrations that I experienced myself. Don't waste your time on fragmented, unprofessional, or outdated information. In this course, I will systematically walk you through the basic concepts from acoustics, audio processing, deep learning, to speaker recognition, and its various applications.
To summarize, what I'm going to teach in this course, is what I wish someone could have taught me many years ago - the core algorithms and engineering practice of speaker recognition.
What is the expected outcome from this course?
Well, that really depends on who you are. This course mainly targets 3 different groups of audience. Group 1, students and researchers; group 2, industry audiences; and group 3, general audiences.
Group 1 audience should include senior college students, graduate students, as well as postdocs and technical staff members working at research institutes. For these audiences, even if you know nothing about any speech technology right now, at the end of this course, you should be able to very confidently talk about topics like audio processing, speaker recognition, deep learning, even the very latest work.
If you haven't done any research before, at the end of this course, you should be comfortable to make a decision whether you want to do your thesis in speaker recognition. If you go to a top conference like ICASSP or Interspeech, you should be comfortable to chat with people and ask people questions without fear.
In group 2, the industry audiences typically include software engineers, system architects, product managers who work on products and services that are related to voice identity techniques. For these audiences, taking this course will help to complement your current knowledge system in this domain, and help you follow the latest trends in academia. This will make you more competitive in your current position, and take your career to the next level.
And group 3, the general audiences. For this group of audience, the purpose of taking this course might be different from the other groups. Many of the lectures in this course could be too technical for the general audience who may not have the corresponding background in mathematics or computer science. For these audiences, it is OK to skip some lectures and the exercises, and only watch those lectures talking about history, applications, and high level concepts. This will help you get a clear big picture of the speaker recognition industry, expand your general knowledge, and maybe make better investment decisions. You will sound like a pro when you chat with your family and friends.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.