Module 1: Introduction to Machine Learning
This module provides the basis for the rest of the course by introducing the basic concepts behind machine learning, and, specifically, how to perform machine learning by using Python and the scikit-learn machine learning module. First, you will learn about the basic types of machine learning. Next, you will learn an important step before applying machine learning algorithms, data pre-processing. Finally, you will learn how to leverage different types of machine learning algorithms in a Python script.
Module 2: Fundamental Algorithms I
This module introduces three machine learning algorithms. First, you will learn how linear regression can be considered a machine learning problem with parameters that must be determined computationally by minimizing a cost function. Next, you will learn Logistic Regression. Despite its name, Logistic Regression is a classification algorithm. Lastly, you will learn Decision Tree, which is a popular machine learning algorithm that can be used for both classification and regression. This module will dive deeper into the concept of machine classification, where algorithms learn from existing, labeled data to classify new, unseen data into specific categories; and, the concept of machine regression, where algorithms learn a model from data to make predictions for new, unseen continuous data. While these algorithms all differ in their mathematical underpinnings, they are often used for classifying numerical, text, and image data or performing regression in a variety of domains.
Module 3: Fundamental Algorithms II
This module introduces three more machine learning algorithms, k-nearest neighbors, support vector machine and random forest. All of them can be used for either classification or regression tasks.
Module 4: Model Evaluation
Model Evaluation is an integral component of any data analytics project. It helps to find out how well the model will work on predicting future (out-of-sample) data. This module introduces basic model evaluation metrics for machine learning algorithms. First, the evaluation metrics for regression is presented. Next the metrics and techniques to evaluate classification are introduced.
Module 5: Model Optimization
This module introduces the techniques of model optimization. First, the basic techniques of feature selection is presented. Next, the technique of cross-validation is introduced, which can provide a more accurate evaluation on models. Finally, model selection, or hyperparameter tuning, which uses cross-validation, is introduced.
Module 6: Introduction to Text Analysis
In this module, you will start applying your new machine learning skills to an exciting data analytic topic: Text Analysis. First, we will review the process by which textual data is converted into numerical data that can be processed by a computer. Along with this are a number of new concepts that focus on manipulating these data to generate improved machine learning predictions. Second, we will apply machine learning algorithms, specifically classification, to text data. Finally, we will explore the more advanced concepts in text analysis and introduce a special kind of text classification: sentiment analysis.
Module 7: Introduction to Clustering
This module introduces clustering, where data points are assigned to sub groups of points based on some specific properties, such as spatial distance or the local density of points. While humans often find clusters visually with ease in a given data sets, computationally the problem is more challenging. This module starts by exploring the basic ideas behind this unsupervised learning technique. One of the most popular clustering techniques, K-means, is introduced. Next, a K-means case study is provided. Finally the density-based DBSCAN technique is introduced.
Module 8: Introduction to Time Series Data
This module introduces time and date data, which provide unique learning opportunities and challenges. First, we will discuss how to properly handle time and date features within a Python program. Next, we will extend this discussion to handle data indexed by time and date information, which is known as time series data.