Preprocessing Data with NumPy from Udemy

The problem

Most data analyst, data science, and coding courses miss a crucial practical step. They don’t teach you how to work with raw data, how to clean and preprocess it. This creates a sizeable gap between the skills you need on the job and the abilities you have acquired in training. Truth be told, real-world data is messy, so you need to know how to overcome this obstacle to become an independent data professional.

The bootcamps we have seen online, and even live classes neglect this aspect and show you how to work with ‘clean’ data. But this isn’t doing you a favor. In reality, it will set you back both when you are applying for jobs, and when you’re on the job.

The solution

Our goal is to provide you with complete preparation using the NumPy package. This course will turn you into capable data analyst with a fantastic understanding of one of the most prominent computing packages in the world. To take you there, we will cover the following topics extensively.

· The ndarray class and why we use it

· The type of data arrays usually contain

· Slicing and squeezing datasets

· Dimensions of arrays, and how to reduce them

· Generating pseudo-random data

· Importing data from external text files

· Saving/Exporting data to external files

· Computing the statistics of the dataset (max, min, mean, variance, etc.)

· Data cleaning

· Data preprocessing

· Final practical example

Each of these subjects builds on the previous ones. And this is precisely what makes our curriculum so valuable. Everything is shown in the right order and we guarantee that you are not going to get lost along the way, as we have provided all necessary steps in video (not a single one skipped). In other words, we are not going to teach you how to concatenate datasets before you know how to index or slice them.

So, to prepare you for the long journey towards a data science position, we created a course that will show you all the tools for the job: The Preprocessing Data with NumPy course [MG1] .

We believe that this resource will significantly boost your chances of landing a job, as it will prepare you for practical tasks and concepts that are frequently included in interviews.

NumPy is Python’s fundamental package for scientific computing. It has established itself as the go-to tool when you need to compute mathematical and statical operations.

Why learn it?

A large portion of a data analyst’s work is dedicated to preprocessing datasets. Unquestionably, this involves tons of mathematical and statistical techniques that NumPy is renowned for. What’s more, the package introduces multi-dimensional array structures and provides a plethora of built-in functions and methods to use while working with them. In other words, NumPy can be described as a computationally stable state-of-the-art Python instrument that provides great flexibility and can take your analysis to the next level.

Some of the topics we will cover:

1. Fundamentals of NumPy

2. Random Generators

3. Working with text files

4. Statistics with NumPy

5. Data preprocessing

6. Final practical example

1. Fundamentals of NumPy

To fully grasp the capabilities of NumPy, we need to start from the fundamentals. In this part of the course, we’ll examine the ndarray class, discuss why it’s so popular and get familiar with terms like “indexing”, “slicing”, “dimensions” and “reducing”.

Why learn it?

As stated above, NumPy is the quintessential package for scientific computing, and to understand its true value, we need to start from its very core – the ndarray class. The better we comprehend the basics, the easier it’s going to be to grasp the more difficult concepts. That’s why it’s fundamental to lay a good foundation on which to build our NumPy skills.

2. Random Generators

After we’ve learned the basics, we’ll move on to pseudo-random data and random generators. These generators will help construct a set of arbitrary variables from a given probability distribution, or a fixed set of viable options.

Why learn it?

Working in a data-driven field, we sometimes need to construct partially arbitrary tests to see if our code works as intended. And here lies the value of random generators, as they allow us to construct datasets of pseudo-random data. The added benefit of random generators is that we can set a seed if we wish to replicate a particular randomization, but we’ll go into all the details in the course itself.

3. Working with text files

Exchanging information with text files is practically how we exchange information today. In this part of the course, we will use the Python, pandas, and NumPy tools covered earlier to give you the essentials you need when importing or saving data.

Why learn it?

In many courses, you are just given a dataset to practice your analytical and programming skills. However, we don’t want to close our eyes to reality, where converting a raw dataset from an external file into a workable Python format can be a massive challenge.

4. Statistics with NumPy

Once we’ve learned how to import large sets of information from external text files, we’ll finally be ready to explore one of NumPy’s strengths – statistics. Since the package is extremely computationally durable, we often rely on its functions and methods to calculate the statistics of a sample dataset. These include the likes of the mean, the standard deviation, and much more.

Why learn it?

To become a data scientist, you not only need to be able to preprocess a dataset, but also to extract valuable insights. One way to learn more about a dataset is by examining its statistics. So, we’ll use the package to understand more about the data and how to convert this knowledge into crucial information we can use for forecasting.

5. Data preprocessing

Even when your dataset is in clean and comprehensible shape, it isn’t quite ready to be processed for visualizations and analysis just yet. There is a crucial step in between, and that’s data preprocessing.

Why learn it?

Data preprocessing is where a data analyst can demonstrate how good or great they are at their job. This stage of the work requires the ability to choose the right statistical tool that will improve the quality of your dataset and the knowledge to implement it with advanced pandas and NumPy techniques. Only when you’ve completed this step can you say that your dataset is preprocessed and ready for the next part, which is data visualization.

6. Practical example

The course contains plenty of exercises and practical cases. What’s more, in the end, we have included a comprehensive practical example that will show you how everything you have learned along the way comes nicely together. This is where you will be able to appreciate how far you have come in your journey on mastering NumPy in your pursuit of a data career.

What you get

· Active Q&A support

· All the NumPy knowledge to become a data analyst

· A community of aspiring data analysts

· A certificate of completion

· Access to frequent future updates

· Real-world training

Get ready to become a NumPy data analyst from scratch

Why wait? Every day is a missed opportunity.

Click the “Buy Now” button and become a part of our data analyst program today.

What's inside

Learning objectives

Arrays.
The definition of a package/library.
Installing and upgrading a package.
Navigating the documentation.
A history of numpy.
The relationship between arrays and vectors.
Arrays vs lists.
Indexing.
Assigning values to arrays.
Elementwise properties and operations.
Datatypes supported by ndarrays.
Broadcasting and type casting.
Running a function or method over a given axis.
Slicing, stepwise slicing, conditional slicing
Dimensionality reduction in arrays.
Generating arrays full of identical values.
Generating non-random sequences of data.
Generating random data with random generators.
Generating random samples from a random probability distribution.
Importing and exporting data with and from numpy.

Npy and npz files.
Maximums and minimums.
Percentiles and quantiles.
Mean and variance.
Covariance and correlation.
Calculating histograms.
Higher dimension histograms.
Finding and filling up missing values.
Substituting "filler" values.
Reshaping arrays.
Removing parts of arrays.
Removing parts of individual elements within arrays. (stripping)
Sorting and shuffling.
Argument functions.
Stacking and concatenating.
Finding the unique values within an array.
A comprehensive practical example of data cleaning and preprocessing.
Show more
Show less

Arrays.
The definition of a package/library.
Installing and upgrading a package.
Navigating the documentation.
A history of numpy.
The relationship between arrays and vectors.
Arrays vs lists.
Indexing.
Assigning values to arrays.
Elementwise properties and operations.
Datatypes supported by ndarrays.
Broadcasting and type casting.
Running a function or method over a given axis.
Slicing, stepwise slicing, conditional slicing
Dimensionality reduction in arrays.
Generating arrays full of identical values.
Generating non-random sequences of data.
Generating random data with random generators.
Generating random samples from a random probability distribution.
Importing and exporting data with and from numpy.
Npy and npz files.
Maximums and minimums.
Percentiles and quantiles.
Mean and variance.
Covariance and correlation.
Calculating histograms.
Higher dimension histograms.
Finding and filling up missing values.
Substituting "filler" values.
Reshaping arrays.
Removing parts of arrays.
Removing parts of individual elements within arrays. (stripping)
Sorting and shuffling.
Argument functions.
Stacking and concatenating.
Finding the unique values within an array.
A comprehensive practical example of data cleaning and preprocessing.
Show more
Show less

Syllabus

Introduction to NumPy

What Does the Course Cover?

Download All Resources

FAQ

The NumPy Package and Its Applications

Installing and Upgrading NumPy

What is an array?

Using the NumPy Documentation

Introduction to NumPy - Exercise

Why Do We Use NumPy?

A Brief History of NumPy

ndarrays

Arrays vs Lists

Why Do We Use NumPy - Exercise

NumPy Fundamentals

Indexing

Assigning Values

Elementwise Properties

NumPy Datatypes

Characteristics of NumPy Functions - Part 1

Characteristics of NumPy Functions - Part 2

NumPy Fundamentals - Exercise

Working with Arrays

Basic Slicing

Stepwise Slicing

Conditional Slicing

Dimensions and the Squeeze Function

Working with Arrays - Exercise

Generating Data with NumPy

Empty Arrays, Arrays of Identical Values

_like Functions

A Sequence of Numbers - np.arange()

Random Generators and Seeds

Random Integers, Probabilities and Choices

Random Probability Distributions

Applications of Random Generators

Generating Data with NumPy - Exercise

Importing and Saving Data

Importing Data with Numpy - np.loadtxtx() vs np.genfromtxt()

Importing Data with NumPy - Simple Cleaning when Importing

Importing Data with NumPy - String vs Object vs Numbers

Importing Data with NumPy - Exercise

Saving Data with NumPy - NPY

Saving Data with NumPy - NPZ

Saving Data with NumPy - CSV

Importing and Saving Data - Exercise

Statistics with NumPy

Using NumPy Statistical Functions

Minimal and Maximal Values

Percentiles and Quantiles

Averages and Variance

Covariance and Correlation

Histogram - Part 1: 1-D Histograms

Histogram - Part 2: Higher Dimension Histograms

N-A-N Equivalent Functions

Statistics with NumPy - Exercise

Manipulation Data with NumPy

Checking for Missing Values

Substituting Filler Values

Reshaping Arrays

Removing Values

Sorting Arrays

Argument Functions - Part 1: Argument Sort

Argument Functions - Part 1: Argument Where

Shuffling Data

Casting Arrays

Stripping Symbols from Arrays

Stacking Arrays

Concatenating Arrays

Finding Unique Values in Arrays

A NumPy Practical Example

Setting Up: Introduction to the Practical Example

Setting Up: Importing the Data Set

Setting Up: Checking for Incomplete Data

Setting Up: Splitting the Dataset

Setting Up: Creating Checkpoints

Manipulating Text Data: Issue Date

Manipulating Text Data: Loan Status and Term

Manipulating Text Data: Grade and Sub Grade

Manipulating Text Data: Verification Status & URL

Manipulating Text Data: State Address

Manipulating Text Data: Converting Strings and Creating a Checkpoint

Manipulating Numeric Data: Substitute Filler Values

Manipulating Numeric Data: Currency Change – The Exchange Rate

Manipulating Numeric Data: Currency Change - From USD to EUR

Completing the Dataset

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Clarifies the relationship between arrays and vectors

Introduces different ways of generating data, including random generators

Covers a wide range of topics, from importing and saving data to manipulating and visualizing it

Emphasizes the practical application of NumPy, with a focus on data cleaning and preprocessing

Provides a step-by-step guide to working with NumPy, making it accessible to beginners

Requires prior knowledge of Python and basic programming concepts

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Preprocessing Data with NumPy with these activities:

Review the NumPy Documentation

Show steps

Familiarizing yourself with NumPy documentation will improve your ability to locate specific functions quickly.

Show steps

Navigate the NumPy website
Search for specific functions
Read the documentation for the array class
Explore the examples provided

Read 'Practical Data Science with Python'

Show steps

Review the fundamentals of Python and data analysis to strengthen your foundation for this course.

View Python for Data Analysis on Amazon

Show steps

Start the book and read at least the first three chapters.
Complete the exercises in the chapters to practice what you have learned.
Take notes on important concepts and techniques.

Join a Study Group

Show steps

Working with a study group can provide fresh perspectives and insights.

Show steps

Find a study partner or group
Set regular meeting times
Discuss course materials and work through exercises together
Share notes and resources

Six other activities

Expand to see all activities and additional details

Show all nine activities

Complete the NumPy tutorial on the official website

Show steps

Explore the official NumPy documentation to enhance your understanding of NumPy's functionality.

Browse courses on NumPy

Show steps

Visit the NumPy official website and navigate to the tutorial section.
Follow the step-by-step instructions and complete the exercises.
Take notes on key concepts and syntax.

Practice Slicing Techniques

Show steps

Slicing is a common operation for data cleaning and preprocessing. Completing these drills will improve your proficiency.

Browse courses on Slicing

Show steps

Slice an array
Slice an array using step size
Slice an array using negative indices
Slice an array using slicing with replacement
Slice a multidimensional array

Complete the NumPy exercises on LeetCode

Show steps

Apply your NumPy skills and reinforce your understanding through practice on LeetCode.

Browse courses on NumPy

Show steps

Create a LeetCode account and sign in.
Search for NumPy exercises and select one.
Attempt to solve the exercise using NumPy functions.
Review the solution and identify areas for improvement.

Practice Reshaping and Manipulating Data

Show steps

Reshaping and manipulating data are essential skills for data cleaning and preprocessing.

Show steps

Reshape an array
Transpose an array
Remove columns from an array
Remove rows from an array
Sort an array

Practice Calculating Statistics with NumPy

Show steps

Calculating statistics with NumPy will enhance your ability to explore and summarize data.

Show steps

Calculate the mean of an array
Calculate the median of an array
Calculate the standard deviation of an array
Calculate the variance of an array
Calculate the covariance of two arrays

Project: Create a Data Preprocessing Pipeline

Show steps

You will get hands-on experience in applying entire data preprocessing pipeline using NumPy's features.

Show steps

Define the goal of the preprocessing pipeline
Import the raw data
Clean and preprocess the data using NumPy functions
Save the preprocessed data for further analysis
Write a report summarizing the preprocessing pipeline

Career center

Learners who complete Preprocessing Data with NumPy will develop knowledge and skills that may be useful to these careers:

Data Architect

Data architects design and build the infrastructure that stores and processes data. They work with data engineers and other analysts to ensure that data is accessible, reliable, and secure. By understanding NumPy and data preprocessing, this course can help you prepare datasets for analysis.

See salaries and explore the career path for Data Architect

Statistician

Statisticians use their mathematical and statistical skills to collect, analyze, and interpret data. They work in a variety of industries, including healthcare, finance, and education. By taking this course, you can learn how to prepare and analyze large datasets as a statistician.

See salaries and explore the career path for Statistician

Quantitative Analyst

Quantitative analysts use mathematical and statistical models to analyze financial data. They use their findings to make investment recommendations and develop trading strategies. With your understanding of NumPy and data preprocessing, you'll be able to efficiently analyze large financial datasets as a quantitative analyst.

See salaries and explore the career path for Quantitative Analyst

Business Analyst

Business analysts use their analytical skills to identify and solve business problems. They work with stakeholders to gather requirements, analyze data, and develop solutions. This course may help you gain the skills necessary to prepare and analyze large datasets as a business analyst.

See salaries and explore the career path for Business Analyst

Machine Learning Engineer

Machine learning engineers use their knowledge of programming, data analysis, and machine learning to develop and deploy machine learning models. These models can be used for a variety of tasks, such as fraud detection, image recognition, and natural language processing. With your understanding of NumPy and data preprocessing techniques, this course can help you prepare datasets for building machine learning models.

See salaries and explore the career path for Machine Learning Engineer

Financial Analyst

Financial analysts use their analytical skills to evaluate the financial performance of companies. They use financial models to make investment recommendations and provide guidance to clients. By understanding NumPy and data preprocessing, this course can help you efficiently analyze large financial datasets as a financial analyst.

See salaries and explore the career path for Financial Analyst

Data Scientist

Data scientists use their programming knowledge, analytical skills, and understanding of machine learning to extract insights from data. They build predictive models and provide solutions to complex business problems. By taking this course, you can gain the skills necessary to prepare and analyze large datasets, making you an effective data scientist.

See salaries and explore the career path for Data Scientist

Operations Research Analyst

Operations research analysts use mathematical and analytical techniques to solve complex problems in various industries. They use their skills to improve efficiency, reduce costs, and make better decisions. As an operations research analyst, understanding data preprocessing and NumPy can be useful in preparing data for modeling and analysis.

See salaries and explore the career path for Operations Research Analyst

Actuary

Actuaries use their mathematical and statistical skills to assess risk and uncertainty. They work in a variety of industries, including insurance, finance, and healthcare. As an actuary, understanding NumPy and data preprocessing techniques can be useful in preparing data for modeling and analysis.

See salaries and explore the career path for Actuary

Data Analyst

Data analysts use their knowledge of data analysis to examine, clean, model, and interpret data. They are integral in helping companies make better decisions by providing insights into customer behavior, market trends, and operational performance. As a data analyst, you'll use your expertise in NumPy and your learned skills in data preprocessing to prepare large datasets for analysis. This course may help you build a foundation in NumPy and data preprocessing, preparing you with the skills needed to begin a data analysis career.

See salaries and explore the career path for Data Analyst

Market Researcher

Market researchers use their analytical skills to understand consumer behavior and market trends. They conduct surveys, focus groups, and other research methods to gather data. As a market researcher, understanding NumPy and data preprocessing can assist in analyzing large datasets to identify trends and patterns in consumer behavior.

See salaries and explore the career path for Market Researcher

Data Engineer

Data engineers are responsible for designing, building, and maintaining the infrastructure that stores and processes data. They work with data scientists and other analysts to ensure that data is accessible, reliable, and secure. By understanding NumPy and data preprocessing, you'll be better equipped to handle large datasets in a data engineering role.

See salaries and explore the career path for Data Engineer

Data Visualization Analyst

Data visualization analysts use their skills in data analysis and visualization to create visual representations of data. They help stakeholders understand complex data and make informed decisions. By learning NumPy and data preprocessing, this course may help you prepare datasets for visualization, enabling you to present data in a clear and concise manner.

See salaries and explore the career path for Data Visualization Analyst

Software Engineer

Software engineers design, develop, and maintain software applications. They use their programming skills to create new features, fix bugs, and improve performance. By understanding NumPy and data preprocessing, this course may be useful in helping you work with large datasets within your software engineering role.

See salaries and explore the career path for Software Engineer

Database Administrator

Database administrators are responsible for managing and maintaining databases. They ensure that data is stored securely and efficiently, and that it is accessible to authorized users. By understanding NumPy and data preprocessing, this course may be useful in helping you manage large datasets within your database administration role.

See salaries and explore the career path for Database Administrator