We may earn an affiliate commission when you visit our partners.
Janani Ravi

This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.

Read more

This course exhaustively covers data preprocessing techniques and transforms available in scikit-learn, allowing the construction of highly optimized features that are scaled, normalized and transformed in mathematically sound ways to fully harness the power of machine learning techniques.

The quality of preprocessing that numeric data is subjected to is an important determinant of the results of machine learning models built using that data. With smart, optimized data pre-processing, you can significantly speed up model training and validation, saving both time and money, as well as greatly improve model performance in prediction.

In this course, Building Features from Numeric Data, you will gain the ability to design and implement effective, mathematically sound data pre-processing pipelines.

First, you will learn the importance of normalization, standardization and scaling, and understand the intuition and mechanics of tweaking the central tendency as well as dispersion of a data feature.

Next, you will discover how to identify and deal with outliers and possibly anomalous data. You will then learn important techniques for scaling and normalization. Such techniques, notably normalization using the L1-norm, L2-norm and Max norm, seek to transform feature vectors to have uniform magnitude. Such techniques find wide usage in ML model building - for instance in computing the cosine similarity of document vectors, and in transforming images before techniques such as convolutional neural networks are applied to them.

You will then move from normalization and standardization to scaling and transforming data. Such transformations include quantization as well as the construction of custom transformers for bespoke use cases. Finally, you will explore how to implement log and power transformations. You will round out the course by comparing the results of three important transformations - the Yeo-Johnson transform, the Box-Cox transform and the quantile transformation - in converting data with non-normal characteristics, such as chi-squared or lognormal data into the familiar bell curve shape that many models work best with.

When you’re finished with this course, you will have the skills and knowledge of data preprocessing and transformation needed to get the best out of your machine learning models.

Enroll now

What's inside

Syllabus

Course Overview
Using Numeric Data in Machine Learning Algorithms
Building Features Using Normalization
Building Features Using Scaling and Transformations
Read more

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Prerequisite knowledge for this course may help learners get the most out of this course
This course may be useful for beginners who want to learn the fundamentals of this topic

Save this course

Save Building Features from Numeric Data to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Building Features from Numeric Data with these activities:
Review statistical distributions
Provides an opportunity to strengthen grasp on underlying math necessary for this course
Browse courses on Statistical Distributions
Show steps
  • Review normal distribution
  • Review binomial distribution
  • Review Poisson distribution
  • Review exponential distribution
  • Review chi-square distribution
Follow Excel PCA tutorial
Provides hands-on experience with applying a core technique in this course
Show steps
  • Find a PCA tutorial for Excel
  • Follow the tutorial
Practice data cleaning in Python
Gives opportunities to reinforce data cleaning concepts for better pre-processing
Browse courses on Data Cleaning
Show steps
  • Find a data cleaning exercise or dataset
  • Clean, normalize, and encode the data
  • Submit results for feedback if possible
Four other activities
Expand to see all activities and additional details
Show all seven activities
Build a data cleaning cheat sheet
Creates a handy reference for quick and easy application of key data cleaning methods
Browse courses on Data Cleaning
Show steps
  • Gather common data cleaning methods and best practices
  • Organize and format the information into a cheat sheet
  • Use the cheat sheet during hands-on practice
Host a study session on feature scaling
Facilitates a collaborative environment for deeper understanding and knowledge exchange with classmates
Browse courses on Feature Scaling
Show steps
  • Gather a few peers
  • Choose a topic within feature scaling to focus on
  • Share resources, insights, and perspectives on the chosen topic
  • Work through problems and examples together
Seek mentorship from a data scientist
Provides personalized guidance, advice, and support to enhance understanding and career development
Browse courses on Data Science
Show steps
  • Network with industry professionals or reach out to alumni
  • Identify a data scientist who is willing to mentor
  • Set up regular meetings or communication channels
  • Seek guidance on course-related topics and career aspirations
Participate in a Kaggle competition
Provides a practical and challenging environment to apply skills and test knowledge
Browse courses on Kaggle
Show steps
  • Find a relevant Kaggle competition
  • Read the problem statement and familiarize yourself with the data
  • Develop a solution and train a model
  • Submit your solution and track your progress

Career center

Learners who complete Building Features from Numeric Data will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists use data analysis techniques to extract insights from data and develop predictive models. They work with numeric data to identify patterns and trends. This course will provide you with the skills and knowledge to excel in this role.
Machine Learning Engineer
Machine Learning Engineers apply machine learning techniques to develop and deploy machine learning models. They work with numeric data to train and evaluate models, and apply data preprocessing techniques to improve model performance. This course will provide you with the skills and knowledge to excel in this role.
Data Engineer
Data Engineers design, build, and maintain data pipelines and infrastructure. They work with data in various formats, including numeric data, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.
Business Analyst
Business Analysts use data analysis techniques to identify and solve business problems. They work with stakeholders to gather requirements, analyze data, and develop solutions. This course will provide you with the skills and knowledge to excel in this role.
Data Architect
Data Architects design and manage data systems and infrastructure. They work with data in various formats, including numeric data, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.
Database Administrator
Database Administrators manage and maintain databases. They work with numeric data in various formats, and apply data transformation techniques to ensure data quality and usability. This course will provide you with the knowledge and skills to succeed in this role.
Statistician
Statisticians collect, analyze, interpret, and present data. They design and conduct statistical studies to provide insights into a variety of topics. This course will provide you with a solid foundation in statistical methods and techniques, which are essential for success in this role.
Data Analyst
Data Analysts examine and interpret data to extract meaningful insights and identify trends. These professionals apply statistical and analytical techniques to data in order to support decision-making. This course will provide you with the skills and knowledge to perform these tasks effectively.
Market Research Analyst
Market Research Analysts conduct research to understand consumer behavior and market trends. They use data analysis techniques to extract insights from research data. This course will provide you with the skills and knowledge to succeed in this role.
Research Scientist
As a Research Scientist, you will develop and conduct scientific research projects. This often involves collecting, analyzing, and interpreting data. Completing this course will help build a foundation for the data analysis techniques you will need in this role.
Financial Analyst
Financial Analysts use data analysis techniques to analyze financial data and make investment decisions. They typically require advanced degrees in fields such as finance or economics. However, this course will help you build a foundation for the data analysis techniques used in this role.
Risk Analyst
Risk Analysts use data analysis techniques to identify and assess risks. They work in a variety of industries, including finance, insurance, and healthcare. This course will help you build a foundation for the data analysis techniques used in this role.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical models to analyze financial data and make investment decisions. They typically require advanced degrees in fields such as mathematics, statistics, or finance. However, this course will help you build a foundation for the data analysis techniques used in this role.
Actuary
Actuaries use mathematical and statistical techniques to assess risk and uncertainty. They work in a variety of industries, including insurance, finance, and consulting. This course will help you build a foundation for the data analysis techniques used in this role.
Software Engineer
Software Engineers design, develop, and maintain software applications. While not directly related to data analysis, this course may be useful for Software Engineers who work with numeric data in their projects.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Building Features from Numeric Data.
Murphy’s book provides a probabilistic perspective on machine learning and deep learning, covering various techniques, including data preprocessing and feature engineering. is essential for learners seeking a strong theoretical foundation in machine learning.
Goodfellow, Bengio, and Courville cover a broad range of topics in deep learning, with an overview of data preprocessing and feature engineering techniques. serves as a valuable resource for learners interested in deep learning and its applications.
By Alpaydin focuses on machine learning techniques for network data, including topics such as link prediction, community detection, and network visualization. It valuable resource for learners interested in social network analysis and network science.
Géron provides a practical guide to feature engineering in Python, covering topics such as data cleaning, feature selection, and feature transformation. valuable resource for those looking to apply feature engineering techniques in their own projects.
Mitchell's book provides a thorough introduction to machine learning algorithms and their applications to image data. It covers topics such as image preprocessing, feature extraction, and image classification, making it suitable for learners interested in computer vision and image-related applications.
Zheng and Casari’s guide provides conceptual depth and practical advice on feature engineering. It covers a range of topics relevant to this course: variable selection, feature extraction, feature transformation, and feature generation. It also includes a reference section with code samples in Python, R, and SQL.
Provost and Fawcett’s book covers a wide range of topics in data science, including data preprocessing and feature engineering. provides a solid foundation for those looking to apply data science techniques in a business context.
Ellis provides a comprehensive overview of machine learning techniques specifically designed for audio data. It covers topics such as audio preprocessing, feature extraction, and audio classification, making it a valuable resource for learners interested in audio processing and music informatics.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Building Features from Numeric Data.
Preparing Data for Modeling with scikit-learn
Most relevant
Data Preprocessing for Data Science
Most relevant
Normalize Data to Make It Appropriate for an Analysis...
Most relevant
Regression & Forecasting for Data Scientists using Python
Most relevant
Java SE 8: Building Your First JavaFX Application
Most relevant
Power Query Fundamentals
Deep Learning: Convolutional Neural Networks in Python
Architecting Serverless Big Data Solutions Using Google...
Traffic Sign Classification Using Deep Learning in...
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser