Course Title: Machine Learning Basics with Minitab
Course Description:
This comprehensive course is designed to provide a detailed understanding of the basics of machine learning using Minitab, with a focus on supervised learning. The course covers the fundamental concepts of regression analysis and binary logistic classification, including how to evaluate models and interpret results. The course also covers tree-based models for binary and multinomial classification.
Course Title: Machine Learning Basics with Minitab
Course Description:
This comprehensive course is designed to provide a detailed understanding of the basics of machine learning using Minitab, with a focus on supervised learning. The course covers the fundamental concepts of regression analysis and binary logistic classification, including how to evaluate models and interpret results. The course also covers tree-based models for binary and multinomial classification.
The course begins with an introduction to machine learning, where students will gain an understanding of what machine learning is, the different types of machine learning, and the difference between supervised and unsupervised learning. This is followed by an overview of the basics of supervised learning, including how to learn, the different types of regression, and the conditions that must be met to use regression models in machine learning versus classical statistics.
The course then delves into regression analysis in detail, covering the different types of regression models and how to use Minitab to evaluate them. This includes a thorough explanation of statistically significant predictors, multicollinearity, and how to handle regression models that include categorical predictors, including additive and interaction effects. Students will also learn how to make predictions for new observations using confidence intervals and prediction intervals.
Next, the course moves onto model building, where students will learn how to handle regression equations with "wrong" predictors and use stepwise regression to find optimal models in Minitab. This includes an overview of how to evaluate models and interpret results.
The course then shifts to binary logistic regression, which is used for binary classification. Students will learn how to evaluate binary classification models, including good fit metrics such as the ROC curve and AUC. They will also use Minitab to analyze a heart failure dataset using binary logistic regression.
The course then covers classification trees, including an overview of node splitting methods such as splitting by misclassification rate, Gini impurity, and entropy. Students will learn how to predict class for a node and evaluate the goodness of the model using misclassification costs, ROC curve, Gain chart, and Lift chart for both binary and multinomial classification.
Finally, the course covers the concept and use of predefined prior probabilities and input misclassification costs, and how to build a tree using Minitab. Throughout the course, students will gain hands-on experience applying the concepts learned in real-world scenarios.
Overall, this course provides a thorough understanding of machine learning basics using Minitab, with a focus on supervised learning, regression analysis, and classification. Upon completion of this course, students will have the knowledge and skills to apply supervised machine learning techniques to real-world data problems.
Predicting the response value for a new observation is the ultimate goal of setting up regression models. If we have built a good model, checked the goodness of that model on a subset of our available data set that we have not used to train the model, we can reasonably expect that our prediction will be "accurate" for a new, unseen observation. The way of this prediction is explained here.
When setting up machine learning models, we want the model to be neither underfitted nor overfitted, but optimal, or in other words "just right". The Stepwise Regression procedure is one of the so-called predictor selection algorithms that can be used to arrive at the optimal model.
This introductory video details the workflow of machine learning methods.
In this lesson we will learn more about the common linear and polynomial models including how they work and how they're trained.
This lesson is about how to compare different models to choose the most appropriate one from the several possible models.
In classical statistics there are some strict conditions imposed to construct a reliable model. In this lesson, we will discuss which of these conditions may be less strict when building machine learning models, and why.
Often a researcher has a large set of candidate predictor variables from which to try to identify the most appropriate predictors to include in the regression model.
In this lesson, we summarize the consequences of choosing one of the many possible models that "in some sense" has the wrong predictors.
When building a good regression model, it is important to know which predictors are included in the final model, which ones are not important and which ones might even degrade the performance of the model. One means of selecting important and necessary predictors is to test whether a predictor is statistically significant. This concept is discussed here.
This lesson is about what is the method, what is the trick to achieve to include qualitative, or in other words categorical, variables in your regression model.
In the previous lesson, we clarified how to set up and interpret a regression model that includes a categorical variable, but where there is no interaction between the numeric and categorical variables. In this lesson, we consider the interaction case.
Multicollinearity exists when two or more predictors in a regression model are moderately or strongly correlated. In this lesson we discuss how to handle this situation when building a model.
The Auto-mpg worksheet, a data file often used in the statistical literature, contains characteristic data for nearly 400 cars. The aim of the example is to set up and evaluate different models to predict consumption. This example consists of two parts, this video contains the first part.
The Auto-mpg worksheet, a data file often used in the statistical literature, contains characteristic data for nearly 400 cars. The aim of the example is to set up and evaluate different models to predict consumption. This example consists of two parts, this video contains the second part.
In this lesson we discuss the basic idea and applications of regression trees.
This example presents a predictive model using a regression tree to predict demand for bike sharing. This video is the first part of the example.
This example presents a predictive model using a regression tree to predict demand for bike sharing. This video is the second part of the example.
In data analysis, when individual sample items can be classified into different categories based on their properties, so-called classification models can be used.
This lesson will discuss binary logistic regression as one of the most popular classification methods.
In the previous video, we introduced the concept of the Confusion Matrix, which is the starting point for judging the goodness of fit of binary classification models, not only for logistic regression, but for all binary classification methods. Several measures and graphs can be constructed using the Confusion Matrix. These measures and graphs are discussed here.
In this lesson, we analyze a dataset of 299 patients with heart failure. This video is the first part of the example.
In this lesson, we analyze a dataset of 299 patients with heart failure. This video is the second part of the example.
The basic idea of constructing classification trees is similar to the one already presented for regression trees. The difference compared to regression trees is that here the possible values of the response variable are not numbers but categorical values, i.e., labels for different classes. This concept is introduced here.
This lesson is about the splitting criteria of a tree node and some basic concepts of node splitting.
The Gini Impurity and the Entropy measures are discussed here.
A special question is which class to assign to each Terminal node after the best partitioning of the nodes. This class is called the predicted class for that node. After completing this lesson students can understand why a certain class is assigned to a given node.
There are several characteristic figures and metrics that help us to judge and quantify the goodness of a model. The Model Misclassification Cost is the most typical metric. The most used charts are the ROC curve, the Gain chart, and the Lift chart.
Here students will be familiar with the concept of the Model Misclassification Cost.
There are several characteristic figures and metrics that help us to judge and quantify the goodness of a model. The Model Misclassification Cost is the most typical metric. The most used charts are the ROC curve, the Gain chart, and the Lift chart.
Here the students will be familiar with the ROC curve, the Gain chart and the Lift chart in the case of binary classification.
There are several characteristic figures and metrics that help us to judge and quantify the goodness of a model. The Model Misclassification Cost is the most typical metric. The most used charts are the ROC curve, the Gain chart, and the Lift chart.
Here the students will be familiar with the ROC curve, the Gain chart and the Lift chart in the case of multinomial classification.
There are cases when our sampling is not random, we deviate from this deliberately. In this case, we do not estimate the population probabilities from the sample, but provide as input data the population probabilities associated with each classification class.
Here the students learn how to interpreted these prior probabilities and how to use it in model building.
Having clarified the details of model building, this lesson summarizes how the process of growing a classification tree is done, and how to find the optimal tree.
At the end of this section, students will be able to use Minitab to build classification trees for practical applications such as maintenance of machines. This lesson is the first part of the example.
At the end of this section, students will be able to use Minitab to build classification trees for practical applications such as maintenance of machines. This lesson is the second part of the example.
A data analysis project starts with getting to know the data file you want to use, and the variables stored in it. Once we have familiarized ourselves with the data file, we start cleaning the data. This 1 million-row real-world data file is a rather large and complex collection of different type of variables so this process will continue in the next lessons.
In this lesson we continue the cleaning of the data table. There may be some trips where the original values of the variables in the data table look good, but when we examine a new variable of interest that we have created from them, we find that the original data cannot be coherent, so we must also declare these trips as false.
In this lesson, we will try whether we can discover new features using existing variables in the existing data file that can help us build our machine learning model.
In this lesson we learn how to build a model that includes higher power members of the predictors.
There is still room to improve the prediction capability of a polynomial regression model with only numerical predictor variables. In this lesson we try to use the product of some predictor members of the model as new predictor variable.
In this lesson, we will look at how to include variables in the model whose values are not numbers but indicate membership of a category.
In this lesson we set up our final model for both the Duration of the trip and the Total Charge to be paid. Here we only mean the final model without validation.
In this lesson we understand the problem of underfitting and overfitting for a regression model.
Here we will use Minitab's Stepwise procedure, its version of Forward selection with validation, to select the predictors of the "Just Right" model.
A more detailed error analysis is taught here.
In this video, we are looking at the Total Charge response variable prediction model.
A more detailed analysis of the magnitude of the errors associated with the forecasts requires further detailed calculations.
In this video we will see if regression tree models can be used to build better models than the multivariate linear models presented earlier.
In this project, for an undergraduate Statistics course currently running, we want to predict at an intermediate point in the course which students will successfully complete the course, and which are at risk of failure.
Here we set up a classification tree model to predict student success.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.