We may earn an affiliate commission when you visit our partners.
Course image
Carlos Guestrin and Emily Fox

Case Studies: Analyzing Sentiment & Loan Default Prediction

Read more

Case Studies: Analyzing Sentiment & Loan Default Prediction

In our case study on analyzing sentiment, you will create models that predict a class (positive/negative sentiment) from input features (text of the reviews, user profile information,...). In our second case study for this course, loan default prediction, you will tackle financial data, and predict when a loan is likely to be risky or safe for the bank. These tasks are an examples of classification, one of the most widely used areas of machine learning, with a broad array of applications, including ad targeting, spam detection, medical diagnosis and image classification.

In this course, you will create classifiers that provide state-of-the-art performance on a variety of tasks. You will become familiar with the most successful techniques, which are most widely used in practice, including logistic regression, decision trees and boosting. In addition, you will be able to design and implement the underlying algorithms that can learn these models at scale, using stochastic gradient ascent. You will implement these technique on real-world, large-scale machine learning tasks. You will also address significant tasks you will face in real-world applications of ML, including handling missing data and measuring precision and recall to evaluate a classifier. This course is hands-on, action-packed, and full of visualizations and illustrations of how these techniques will behave on real data. We've also included optional content in every module, covering advanced topics for those who want to go even deeper!

Learning Objectives: By the end of this course, you will be able to:

-Describe the input and output of a classification model.

-Tackle both binary and multiclass classification problems.

-Implement a logistic regression model for large-scale classification.

-Create a non-linear model using decision trees.

-Improve the performance of any model using boosting.

-Scale your methods with stochastic gradient ascent.

-Describe the underlying decision boundaries.

-Build a classification model to predict sentiment in a product review dataset.

-Analyze financial data to predict loan defaults.

-Use techniques for handling missing data.

-Evaluate your models using precision-recall metrics.

-Implement these techniques in Python (or in the language of your choice, though Python is highly recommended).

Enroll now

What's inside

Syllabus

Welcome!
Classification is one of the most widely used techniques in machine learning, with a broad array of applications, including sentiment analysis, ad targeting, spam detection, risk assessment, medical diagnosis and image classification. The core goal of classification is to predict a category or class y from some inputs x. Through this course, you will become familiar with the fundamental models and algorithms used in classification, as well as a number of core machine learning concepts. Rather than covering all aspects of classification, you will focus on a few core techniques, which are widely used in the real-world to get state-of-the-art performance. By following our hands-on approach, you will implement your own algorithms on multiple real-world tasks, and deeply grasp the core techniques needed to be successful with these approaches in practice. This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.
Read more
Linear Classifiers & Logistic Regression
Linear classifiers are amongst the most practical classification methods. For example, in our sentiment analysis case-study, a linear classifier associates a coefficient with the counts of each word in the sentence. In this module, you will become proficient in this type of representation. You will focus on a particularly useful type of linear classifier called logistic regression, which, in addition to allowing you to predict a class, provides a probability associated with the prediction. These probabilities are extremely useful, since they provide a degree of confidence in the predictions. In this module, you will also be able to construct features from categorical inputs, and to tackle classification problems with more than two class (multiclass problems). You will examine the results of these techniques on a real-world product sentiment analysis task.
Learning Linear Classifiers
Once familiar with linear classifiers and logistic regression, you can now dive in and write your first learning algorithm for classification. In particular, you will use gradient ascent to learn the coefficients of your classifier from data. You first will need to define the quality metric for these tasks using an approach called maximum likelihood estimation (MLE). You will also become familiar with a simple technique for selecting the step size for gradient ascent. An optional, advanced part of this module will cover the derivation of the gradient for logistic regression. You will implement your own learning algorithm for logistic regression from scratch, and use it to learn a sentiment analysis classifier.
Overfitting & Regularization in Logistic Regression
As we saw in the regression course, overfitting is perhaps the most significant challenge you will face as you apply machine learning approaches in practice. This challenge can be particularly significant for logistic regression, as you will discover in this module, since we not only risk getting an overly complex decision boundary, but your classifier can also become overly confident about the probabilities it predicts. In this module, you will investigate overfitting in classification in significant detail, and obtain broad practical insights from some interesting visualizations of the classifiers' outputs. You will then add a regularization term to your optimization to mitigate overfitting. You will investigate both L2 regularization to penalize large coefficient values, and L1 regularization to obtain additional sparsity in the coefficients. Finally, you will modify your gradient ascent algorithm to learn regularized logistic regression classifiers. You will implement your own regularized logistic regression classifier from scratch, and investigate the impact of the L2 penalty on real-world sentiment analysis data.
Decision Trees
Along with linear classifiers, decision trees are amongst the most widely used classification techniques in the real world. This method is extremely intuitive, simple to implement and provides interpretable predictions. In this module, you will become familiar with the core decision trees representation. You will then design a simple, recursive greedy algorithm to learn decision trees from data. Finally, you will extend this approach to deal with continuous inputs, a fundamental requirement for practical problems. In this module, you will investigate a brand new case-study in the financial sector: predicting the risk associated with a bank loan. You will implement your own decision tree learning algorithm on real loan data.
Preventing Overfitting in Decision Trees
Out of all machine learning techniques, decision trees are amongst the most prone to overfitting. No practical implementation is possible without including approaches that mitigate this challenge. In this module, through various visualizations and investigations, you will investigate why decision trees suffer from significant overfitting problems. Using the principle of Occam's razor, you will mitigate overfitting by learning simpler trees. At first, you will design algorithms that stop the learning process before the decision trees become overly complex. In an optional segment, you will design a very practical approach that learns an overly-complex tree, and then simplifies it with pruning. Your implementation will investigate the effect of these techniques on mitigating overfitting on our real-world loan data set.
Handling Missing Data
Real-world machine learning problems are fraught with missing data. That is, very often, some of the inputs are not observed for all data points. This challenge is very significant, happens in most cases, and needs to be addressed carefully to obtain great performance. And, this issue is rarely discussed in machine learning courses. In this module, you will tackle the missing data challenge head on. You will start with the two most basic techniques to convert a dataset with missing data into a clean dataset, namely skipping missing values and inputing missing values. In an advanced section, you will also design a modification of the decision tree learning algorithm that builds decisions about missing data right into the model. You will also explore these techniques in your real-data implementation.
Boosting
One of the most exciting theoretical questions that have been asked about machine learning is whether simple classifiers can be combined into a highly accurate ensemble. This question lead to the developing of boosting, one of the most important and practical techniques in machine learning today. This simple approach can boost the accuracy of any classifier, and is widely used in practice, e.g., it's used by more than half of the teams who win the Kaggle machine learning competitions. In this module, you will first define the ensemble classifier, where multiple models vote on the best prediction. You will then explore a boosting algorithm called AdaBoost, which provides a great approach for boosting classifiers. Through visualizations, you will become familiar with many of the practical aspects of this techniques. You will create your very own implementation of AdaBoost, from scratch, and use it to boost the performance of your loan risk predictor on real data.
Precision-Recall
In many real-world settings, accuracy or error are not the best quality metrics for classification. You will explore a case-study that significantly highlights this issue: using sentiment analysis to display positive reviews on a restaurant website. Instead of accuracy, you will define two metrics: precision and recall, which are widely used in real-world applications to measure the quality of classifiers. You will explore how the probabilities output by your classifier can be used to trade-off precision with recall, and dive into this spectrum, using precision-recall curves. In your hands-on implementation, you will compute these metrics with your learned classifier on real-world sentiment analysis data.
Scaling to Huge Datasets & Online Learning
With the advent of the internet, the growth of social media, and the embedding of sensors in the world, the magnitudes of data that our machine learning algorithms must handle have grown tremendously over the last decade. This effect is sometimes called "Big Data". Thus, our learning algorithms must scale to bigger and bigger datasets. In this module, you will develop a small modification of gradient ascent called stochastic gradient, which provides significant speedups in the running time of our algorithms. This simple change can drastically improve scaling, but makes the algorithm less stable and harder to use in practice. In this module, you will investigate the practical techniques needed to make stochastic gradient viable, and to thus to obtain learning algorithms that scale to huge datasets. You will also address a new kind of machine learning problem, online learning, where the data streams in over time, and we must learn the coefficients as the data arrives. This task can also be solved with stochastic gradient. You will implement your very own stochastic gradient ascent algorithm for logistic regression from scratch, and evaluate it on sentiment analysis data.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Brings together classification techniques, which are highly relevant across a broad array of industries, including ad targeting and medical diagnosis
Enables learners to become familiar with the most successful techniques, which are widely used in practice, such as logistic regression, decision trees, and boosting
Includes optional advanced topics, suitable for learners wanting to develop deeper knowledge
Hands-on, action-packed, and full of visualizations and illustrations of how these techniques behave on real data
Recommended for those with a background in linear algebra and probability
May be challenging for beginners without this background

Save this course

Save Machine Learning: Classification to your list so you can find it easily later:
Save

Reviews summary

Machine learning: a solid introduction to classification

According to students, Machine Learning: Classification is a well-structured course that makes complicated concepts easy to understand. With short videos, quizzes, and assignments, this course gradually teaches you different classification approaches. Students say that the engaging assignments and clear explanations make them excited to continue with the specialization.
Course is well-paced and easy to understand.
"The course is great, as the others in this specializations, they really make it simple, but they are not going deeper in the algorithms, however it is really great!"
Learners enjoy the assignments.
"This course can really teach you about classification approaches and problems you might encounter."
"The quiz felt like challenges, but they were all doable and made me feel good about completing them."
"Not too easy and not too difficult."
Instructors explain concepts well.
"Loved the way our tutor (Carlos) explained the concepts to us. Things are getting clearer with each course in ML :) Many thanks :)"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Machine Learning: Classification with these activities:
Gather and review course materials
Reviewing course materials before the start of the course helps you become familiar with the course structure and topics.
Show steps
  • Obtain syllabus and class schedule
  • Review course objectives and learning outcomes
  • Locate textbooks, online resources, and any required software
  • Set up a dedicated workspace for studying
Review foundational math concepts
Linear Algebra is a cornerstone of machine learning and will be used extensively in this course for representing features and learning models. Make sure that you are comfortable with this foundational skill before entering the course.
Browse courses on Multivariate Calculus
Show steps
  • Review the basics of linear algebra, including vector operations, matrix multiplication, and eigenvalues.
  • Practice solving linear algebra problems, such as finding the determinant of a matrix or finding the eigenvectors of a matrix.
Review Linear Regression Concepts
Strengthen your understanding of linear regression, a prerequisite for this course.
Browse courses on Linear Regression
Show steps
  • Review the mathematical concepts behind linear regression.
  • Solve practice problems involving linear regression.
17 other activities
Expand to see all activities and additional details
Show all 20 activities
Review Python
Recall basic Python concepts and syntax before the course starts.
Browse courses on Python
Show steps
  • Review Python data types, operators, and control flow.
  • Practice writing simple Python functions and scripts.
Brush up on mathematics
Refresher on the key mathematics that underlie machine learning will set you up for success.
Browse courses on Linear Algebra
Show steps
  • Review linear algebra concepts such as vectors, matrices, and eigenvalues
  • Revise probability theory, including Bayes' theorem and conditional probability
  • Go over calculus, focusing on derivatives and integrals
  • Study stochastic processes, including Markov chains and hidden Markov models
Connect with experts in the field
Seeking guidance from experienced professionals expands your knowledge and provides valuable insights into the field.
Browse courses on Machine Learning
Show steps
  • Identify potential mentors through industry events, online platforms, or personal connections
  • Reach out to mentors via email or LinkedIn and express your interest in their expertise
  • Schedule regular meetings or discussions to ask questions, receive feedback, and gain industry knowledge
Practice implementing Logistic Regression
Solidify your understanding of logistic regression by implementing it on a dataset.
Browse courses on Logistic Regression
Show steps
  • Choose a dataset with binary labels.
  • Split the dataset into training and testing sets.
  • Implement the logistic regression algorithm from scratch.
  • Train the model on the training set.
  • Evaluate the model on the testing set.
Complete a Logistic Regression Tutorial
Learn the basics of logistic regression through a guided tutorial.
Browse courses on Logistic Regression
Show steps
  • Find a tutorial on logistic regression.
  • Follow the tutorial step-by-step, implementing the algorithm from scratch.
  • Test your understanding by solving practice problems.
Complete introductory tutorials on logistic regression and decision trees
Guided tutorials provide a structured approach to gaining foundational knowledge in logistic regression and decision trees.
Browse courses on Logistic Regression
Show steps
  • Identify online tutorials or video courses on logistic regression and decision trees
  • Follow the tutorials step-by-step to understand the concepts and algorithms
  • Practice implementing logistic regression and decision tree models using the provided code samples
Tutorial on Gradient Descent
Gain a deeper understanding of gradient descent, a fundamental algorithm in machine learning.
Browse courses on Gradient Descent
Show steps
  • Find a tutorial on gradient descent.
  • Follow the tutorial and implement gradient descent in Python.
  • Experiment with different learning rates and see how they affect the convergence of the algorithm.
Build a Decision Tree Classifier
Apply your knowledge of decision trees by building one to classify data.
Browse courses on Decision Trees
Show steps
  • Choose a dataset with categorical features.
  • Split the dataset into training and testing sets.
  • Implement the decision tree algorithm from scratch.
  • Train the model on the training set.
  • Evaluate the model on the testing set.
Solve Decision Tree Practice Problems
Strengthen your understanding of decision trees by solving practice problems.
Browse courses on Decision Trees
Show steps
  • Find a set of decision tree practice problems.
  • Solve the problems using the decision tree algorithm.
  • Analyze your results and identify areas for improvement.
Practice Overfitting Mitigation Techniques
Develop your skills in mitigating overfitting, a common challenge in machine learning.
Browse courses on Overfitting
Show steps
  • Choose a dataset that is prone to overfitting.
  • Split the dataset into training and testing sets.
  • Implement a machine learning model that is prone to overfitting.
  • Apply overfitting mitigation techniques such as regularization.
  • Evaluate the model on the testing set and compare the results with and without overfitting mitigation.
Solve practice exercises on classification problems
Regular practice with classification problems consolidates your understanding of the techniques and strengthens your problem-solving skills.
Browse courses on Logistic Regression
Show steps
  • Find practice problems on platforms like LeetCode, Kaggle, or Coursera
  • Attempt to solve the problems using logistic regression or decision tree models
  • Compare your solutions with provided answers or discuss them in online forums
Join a Study Group for Logistic Regression
Collaborate with peers to enhance your understanding of logistic regression.
Browse courses on Logistic Regression
Show steps
  • Find or create a study group focused on logistic regression.
  • Meet regularly to discuss concepts, solve problems, and share insights.
  • Provide feedback and support to your fellow group members.
Write a Summary of Boosting Techniques
Enhance your understanding of boosting techniques by summarizing them in writing.
Browse courses on Boosting
Show steps
  • Research different boosting techniques.
  • Write a summary of the techniques, including their strengths and weaknesses.
  • Include examples of how boosting techniques have been used in practice.
Contribute to an Open-Source Machine Learning Library
Gain practical experience and contribute to the field of machine learning.
Browse courses on Open Source
Show steps
  • Find an open-source machine learning library that interests you.
  • Identify an area where you can contribute.
  • Make a pull request to the library.
Write a Blog Post on Boosting Techniques
Solidify your knowledge of boosting techniques by explaining them in a blog post.
Browse courses on Boosting
Show steps
  • Research and gather information on boosting techniques.
  • Write a clear and concise blog post explaining the concepts and applications of boosting.
  • Share your blog post with others for feedback and discussion.
Build a classification model for a real-world dataset
Applying your skills to a real-world problem reinforces your learning and provides a tangible demonstration of your abilities.
Browse courses on Logistic Regression
Show steps
  • Choose a relevant dataset that aligns with the course topics
  • Preprocess and explore the dataset to understand its characteristics
  • Train and evaluate logistic regression and decision tree models on the dataset
  • Interpret the results, analyze model performance, and draw insights
  • Present your findings in a report or presentation
Mentor a Junior Machine Learning Enthusiast
Deepen your understanding of machine learning concepts by teaching them to others.
Show steps
  • Find a junior machine learning enthusiast who wants to learn.
  • Establish regular meetings to discuss machine learning topics.
  • Provide guidance and support to the mentee as they learn and grow.

Career center

Learners who complete Machine Learning: Classification will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists seek to transform raw data into usable data, which is then transformed into valuable insights for a given business. In order to do so, Data Scientists leverage a variety of techniques, including machine learning and predictive analytics, to make sense of vast quantities of complex data. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Data Scientist, since the course will provide a foundational understanding of the techniques that are used in the field.
Software Engineer
Software Engineers apply engineering principles to design and build software applications, leveraging a variety of tools and techniques to do so. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Software Engineer, since the course will provide a practical understanding of machine learning principles and techniques that many companies are leveraging to develop new products and services.
Market Researcher
Market Researchers study consumer trends and demographics to analyze market conditions and forecast future trends. This course in Machine Learning: Classification provides the necessary foundational understanding of machine learning techniques and how they may be implemented to interpret and analyze market data.
Data Analyst
Data Analysts use various techniques to collect, analyze, interpret, and present data, working with stakeholders in a given organization to apply data-driven insights to decision making. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Data Analyst, as the course will provide the foundational understanding of machine learning techniques that are often used to analyze data.
Statistician
Statisticians use mathematical and statistical techniques to collect, analyze, interpret, and present data, working with stakeholders in a given organization to apply data-driven insights to decision making. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Statistician, as the course will provide the foundational understanding of machine learning techniques that are often used to analyze data.
Quantitative Analyst
Quantitative Analysts create mathematical models and apply statistical techniques to evaluate financial data, helping make informed investment decisions. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Quantitative Analyst, as the course will provide the foundational understanding of machine learning techniques that are often used to analyze financial data.
Marketing Analyst
Marketing Analysts use data analysis techniques to measure the effectiveness of marketing campaigns, leveraging their findings to optimize campaigns for improved results. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Marketing Analyst, as the course will provide the foundational understanding of machine learning techniques that can be used to analyze marketing data.
Operations Research Analyst
Operations Research Analysts use mathematical and analytical techniques to solve complex business problems, often leveraging optimization techniques to find the best solution to problems.
Business Analyst
Business Analysts use analytical techniques to understand business needs and develop solutions to meet those needs, often working with stakeholders to identify and define requirements. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Business Analyst, as the course will provide the foundational understanding of machine learning techniques that are used to analyze data and make informed decisions.
Management Consultant
Management Consultants provide advice to organizations on how to improve their performance, leveraging analytical techniques to identify areas for improvement and develop solutions. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Management Consultant, as the course will provide the foundational understanding of machine learning techniques that are used to analyze data and make informed decisions.
Financial Analyst
Financial Analysts use financial data to evaluate the performance of companies and make investment recommendations, leveraging analytical techniques to identify trends and make predictions. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Financial Analyst, as the course will provide the foundational understanding of machine learning techniques that are used to analyze financial data.
Risk Analyst
Risk Analysts use statistical and analytical techniques to identify and assess risks, working with stakeholders to develop and implement risk management strategies.
Data Architect
Data Architects design and build data systems, working with stakeholders to identify and define data requirements. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Data Architect, as the course will provide the foundational understanding of machine learning techniques that can be used to analyze data and make informed decisions.
Machine Learning Engineer
Machine Learning Engineers design, build, and maintain machine learning models, working with stakeholders to identify and define requirements. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Machine Learning Engineer, as the course will provide the foundational understanding of machine learning techniques and algorithms.
Software Developer
Software Developers design, build, and maintain software applications, working with stakeholders to identify and define requirements. This course in Machine Learning: Classification may be useful to an individual who wants to work as a Software Developer, as the course will provide the foundational understanding of machine learning techniques that are often used in software applications.

Reading list

We've selected 13 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Machine Learning: Classification.
Provides a comprehensive overview of statistical learning, covering fundamental concepts, algorithms, and applications. It great resource for learners who want to gain a deep understanding of statistical learning.
Provides a comprehensive overview of machine learning, covering fundamental concepts, algorithms, and applications. It great resource for learners who want to gain a deep understanding of machine learning.
Provides a comprehensive overview of deep learning, covering fundamental concepts, algorithms, and applications. It great resource for learners who want to gain a deep understanding of deep learning.
Provides a mathematical introduction to pattern recognition and machine learning. It covers a wide range of topics, including Bayesian inference, decision theory, and neural networks. It great resource for learners who want to gain a theoretical understanding of machine learning.
Provides a comprehensive overview of machine learning, covering fundamental concepts, algorithms, and applications. It great resource for learners who want to gain a deep understanding of machine learning.
Provides a probabilistic introduction to machine learning. It covers a wide range of topics, including Bayesian inference, graphical models, and reinforcement learning. It great resource for learners who want to gain a theoretical understanding of machine learning.
Provides a practical introduction to machine learning using Python. It covers a wide range of topics, including data preprocessing, feature engineering, model selection, and evaluation. It great resource for learners who want to get started with machine learning using Python.
Provides a practical introduction to machine learning using Python. It covers a wide range of topics, including data preprocessing, feature engineering, model selection, and evaluation. It great resource for learners who want to get started with machine learning using Python.
Provides a practical introduction to machine learning using Python. It covers a wide range of topics, including data preprocessing, feature engineering, model selection, and evaluation. It great resource for learners who want to get started with machine learning using Python.
Provides a conceptual introduction to machine learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and reinforcement learning. It great resource for learners who want to get a basic understanding of machine learning.
Provides a non-technical introduction to machine learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and reinforcement learning. It great resource for learners who want to get a basic understanding of machine learning.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Machine Learning: Classification.
Machine Learning Foundations: A Case Study Approach
Most relevant
Classification Models
Most relevant
Sentiment Analysis with Deep Learning using BERT
Most relevant
Natural Language Processing for Stocks News Analysis
Most relevant
TensorFlow Developer Certificate - Natural Language...
Most relevant
Data Analytics in Accounting Capstone
Most relevant
Amazon Echo Reviews Sentiment Analysis Using NLP
Most relevant
Sentiment Analysis with Recurrent Neural Networks in...
Basic Sentiment Analysis with TensorFlow
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser