Building Features from Numeric Data from Pluralsight

This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.

The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction.

In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines.

First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature.

Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them.

You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with.

When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.

What's inside

Syllabus

Course Overview

Using Numeric Data in Machine Learning Algorithms

Building Features Using Normalization

Building Features Using Scaling and Transformations

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Prerequisite knowledge for this course may help learners get the most out of this course

This course may be useful for beginners who want to learn the fundamentals of this topic

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Building Features from Numeric Data with these activities:

Review statistical distributions

Show steps

Provides an opportunity to strengthen grasp on underlying math necessary for this course

Browse courses on Statistical Distributions

Show steps

Review normal distribution
Review binomial distribution
Review Poisson distribution
Review exponential distribution
Review chi-square distribution

Follow Excel PCA tutorial

Show steps

Provides hands-on experience with applying a core technique in this course

Browse courses on Principal Component Analysis

Show steps

Find a PCA tutorial for Excel
Follow the tutorial

Practice data cleaning in Python

Show steps

Gives opportunities to reinforce data cleaning concepts for better pre-processing

Browse courses on Data Cleaning

Show steps

Find a data cleaning exercise or dataset
Clean, normalize, and encode the data
Submit results for feedback if possible

Four other activities

Expand to see all activities and additional details

Show all seven activities

Build a data cleaning cheat sheet

Show steps

Creates a handy reference for quick and easy application of key data cleaning methods

Browse courses on Data Cleaning

Show steps

Gather common data cleaning methods and best practices
Organize and format the information into a cheat sheet
Use the cheat sheet during hands-on practice

Host a study session on feature scaling

Show steps

Facilitates a collaborative environment for deeper understanding and knowledge exchange with classmates

Browse courses on Feature Scaling

Show steps

Gather a few peers
Choose a topic within feature scaling to focus on
Share resources, insights, and perspectives on the chosen topic
Work through problems and examples together

Seek mentorship from a data scientist

Show steps

Provides personalized guidance, advice, and support to enhance understanding and career development

Browse courses on Data Science

Show steps

Network with industry professionals or reach out to alumni
Identify a data scientist who is willing to mentor
Set up regular meetings or communication channels
Seek guidance on course-related topics and career aspirations

Participate in a Kaggle competition

Show steps

Provides a practical and challenging environment to apply skills and test knowledge

Browse courses on Kaggle

Show steps

Find a relevant Kaggle competition
Read the problem statement and familiarize yourself with the data
Develop a solution and train a model
Submit your solution and track your progress

Career center

Learners who complete Building Features from Numeric Data will develop knowledge and skills that may be useful to these careers:

Data Scientist

Data Scientists use data analysis techniques to extract insights from data and develop predictive models. They work with numeric data to identify patterns and trends. This course will provide you with the skills and knowledge to excel in this role.

See salaries and explore the career path for Data Scientist

Machine Learning Engineer

Machine Learning Engineers apply machine learning techniques to develop and deploy machine learning models. They work with numeric data to train and evaluate models, and apply data preprocessing techniques to improve model performance. This course will provide you with the skills and knowledge to excel in this role.

See salaries and explore the career path for Machine Learning Engineer

Business Analyst

Business Analysts use data analysis techniques to identify and solve business problems. They work with stakeholders to gather requirements, analyze data, and develop solutions. This course will provide you with the skills and knowledge to excel in this role.

See salaries and explore the career path for Business Analyst

Data Engineer

Data Engineers design, build, and maintain data pipelines and infrastructure. They work with data in various formats, including numeric data, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.

See salaries and explore the career path for Data Engineer

Data Architect

Data Architects design and manage data systems and infrastructure. They work with data in various formats, including numeric data, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.

See salaries and explore the career path for Data Architect

Statistician

Statisticians collect, analyze, interpret, and present data. They design and conduct statistical studies to provide insights into a variety of topics. This course will provide you with a solid foundation in statistical methods and techniques, which are essential for success in this role.

See salaries and explore the career path for Statistician

Data Analyst

Data Analysts examine and interpret data to extract meaningful insights and identify trends. These professionals apply statistical and analytical techniques to data in order to support decision-making. This course will provide you with the skills and knowledge to perform these tasks effectively.

See salaries and explore the career path for Data Analyst

Market Research Analyst

Market Research Analysts conduct research to understand consumer behavior and market trends. They use data analysis techniques to extract insights from research data. This course will provide you with the skills and knowledge to succeed in this role.

See salaries and explore the career path for Market Research Analyst

Database Administrator

Database Administrators manage and maintain databases. They work with numeric data in various formats, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.

See salaries and explore the career path for Database Administrator

Research Scientist

As a Research Scientist, you will develop and conduct scientific research projects. This often involves collecting, analyzing, and interpreting data. Completing this course will help build a foundation for the data analysis techniques you will need in this role.

See salaries and explore the career path for Research Scientist

Financial Analyst

Financial Analysts use data analysis techniques to analyze financial data and make investment decisions. They typically require advanced degrees in fields such as finance or economics. However, this course will help you build a foundation for the data analysis techniques used in this role.

See salaries and explore the career path for Financial Analyst

Actuary

Actuaries use mathematical and statistical techniques to assess risk and uncertainty. They work in a variety of industries, including insurance, finance, and consulting. This course will help you build a foundation for the data analysis techniques used in this role.

See salaries and explore the career path for Actuary

Quantitative Analyst

Quantitative Analysts use mathematical and statistical models to analyze financial data and make investment decisions. They typically require advanced degrees in fields such as mathematics, statistics, or finance. However, this course will help you build a foundation for the data analysis techniques used in this role.

See salaries and explore the career path for Quantitative Analyst

Risk Analyst

Risk Analysts use data analysis techniques to identify and assess risks. They work in a variety of industries, including finance, insurance, and healthcare. This course will help you build a foundation for the data analysis techniques used in this role.

See salaries and explore the career path for Risk Analyst

Software Engineer

Software Engineers design, develop, and maintain software applications. While not directly related to data analysis, this course may be useful for Software Engineers who work with numeric data in their projects.

See salaries and explore the career path for Software Engineer

Building Features from Numeric Data

What's inside

Syllabus

Traffic lights

Save this course

Activities

Career center

Reading list

Share

Similar courses