This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.
This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.
The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction.
In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines.
First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature.
Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them.
You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with.
When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.