Student Testimonials:
Student Testimonials:
The instructor knows the material, and has detailed explanation on every topic he discusses. Has clarity too, and warns students of potential pitfalls. He has a very logical explanation, and it is easy to follow him. I highly recommend this class, and would look into taking a new class from him. - Diana
This is excellent, and I cannot complement the instructor enough. Extremely clear, relevant, and high quality - with helpful practical tips and advice. Would recommend this to anyone wanting to learn pandas. Lessons are well constructed. I'm actually surprised at how well done this is. I don't give many 5 stars, but this has earned it so far. - Michael
This course is very thorough, clear, and well thought out. This is the best Udemy course I have taken thus far. (This is my third course.) The instruction is excellent. - James
Welcome to the most comprehensive Pandas course available on Udemy. An excellent choice for both beginners and experts looking to expand their knowledge on one of the most popular Python libraries in the world.
Data Analysis with Pandas and Python offers 19+ hours of in-depth video tutorials on the most powerful data analysis toolkit available today. Lessons include:
installing
sorting
filtering
grouping
aggregating
de-duplicating
pivoting
munging
deleting
merging
visualizing
and more.
Why learn pandas?
If you've spent time in a spreadsheet software like Microsoft Excel, Apple Numbers, or Google Sheets and are eager to take your data analysis skills to the next level, this course is for you.
Data Analysis with Pandas and Python introduces you to the popular Pandas library built on top of the Python programming language.
Pandas is a powerhouse tool that allows you to do anything and everything with colossal data sets analyzing, organizing, sorting, filtering, pivoting, aggregating, munging, cleaning, calculating, and more.
I call it "Excel on steroids".
Over the course of more than 19 hours, I'll take you step-by-step through Pandas, from installation to visualization. We'll cover hundreds of different methods, attributes, features, and functionalities packed away inside this awesome library. We'll dive into tons of different datasets, short and long, broken and pristine, to demonstrate the incredible versatility and efficiency of this package.
Data Analysis with Pandas and Python is bundled with dozens of datasets for you to use. Dive right in and follow along with my lessons to see how easy it is to get started with pandas.
Whether you're a new data analyst or have spent years (*cough* too long *cough*) in Excel, Data Analysis with pandas and Python offers you an incredible introduction to one of the most powerful data toolkits available today.
Welcome to Data Analysis with Pandas and Python! In this lesson, we'll introduce the pandas library, the Python language, the structure of the course, and the prerequisites.
In this lesson, we download and install the Anaconda distribution for macOS computers. Windows users are welcome to skip this lesson.
In this lesson, we download and install the Anaconda distribution for Windows computers. macOS users are welcome to skip this lesson.
Learn how to uninstall the Anaconda distribution both Windows and macOS computers.
Anaconda Navigator is a graphical program for creating and managing conda environments. In this lesson, we create a new environment for the course and install pandas within it. We also setup the Jupyter Lab coding environment where we'll be writing our Python/pandas code.
Download the course materials (datasets and Jupyter Notebooks) for the course.
In this lesson, we walk through the process of starting up and shutting down Jupyter Lab, our coding environment. We open some sample Jupyter Notebooks and describe how a Python server runs continuously in the background, waiting to execute the contents of a code cell.
In this lesson, we introduce the Jupyter Lab interface. A Notebook consists of cells, which can have different types. We introduce some common actions like creating cells, cutting and pasting cells, stopping the kernel, and restarting the Jupyter server.
In this lesson, we observe how to execute Python code cells in Jupyter Lab. Use Shift + Enter to run a cell and navigate to the next one. Use Ctrl + Enter to execute a cell and stay within it.
To conserve memory, Jupyter won't load Python packages into your Notebook automatically. In this lesson, we use Python's import keyword to bring the pandas library into a Notebook. We also talk about assigning aliases with the as keyword.
Test your knowledge of the concepts introduced in this section in this multiple-choice quiz.
A comment is a line ignored by the Python interpreter when the program/cell runs. Declare a comment with a hashtag (#) symbol.
In this lesson, we introduce the most common data types in Python including integers, floating-points, strings, Booleans, and None.
In this lesson, we discuss common mathematical and logical operators including addition, subtraction, multiplication, division, concatenation, modulo, equality, inequality, and more.
A variable is a name we assign to a value in our program. In this lesson, we practice declaring variables and discuss Python community conventions for naming them.
A function is a reusable procedure, a sequence of steps to follow in order. In this lesson, we introduce Python's built-in functions and the syntax for invoking them.
Now it's time to build our own custom functions. In this lesson, we walk through defining a temperature conversion function from start to finish.
A method is a function attached to an object. It's a command or action we can ask the object to take. In this lesson, we explore some common string methods and discuss mutable vs. immutable objects.
A list is a mutable collection of ordered values. In this lesson, we learn the syntax for declaring lists as well as some common methods like pop and append.
Python assigns each list element and each string character an index position that reflects its place in line. In this lesson, we learn how to extract elements and characters from their lists/strings using square bracket notation.
A dictionary is a mutable collection of key-value pairs. A key serves as a unique identifier for a value. The keys must be unique, while the values can contain duplicates. In this lesson, we practice declaring some dictionary objects.
A class is a blueprint/template for creating an object. In this lesson, we walk through the terminology and provide a real-world analogy.
In this lesson, we import the pandas library and explore some of its available classes, functions, and modules.
Test your knowledge of the concepts introduced in this course section.
A pandas Series is a one-dimensional labelled array that combines the best features of a list and a dictionary. In this lesson, we instantiate our first Series objects and introduce the index, the collection of identifiers for the Series' values.
In this lesson, we practice creating Series objects with dictionaries as the data source. Pandas will use the keys for the Series's index labels ad the values for the Series's values.
In this lesson, we invoke some sample methods like sum, product, and mean on Series objects.
An attribute is a piece of data that lives on an object. It's a fact, a detail, a characteristic of the object. In this lesson, we access various attributes on the Series and introduce the concept of composition. We also explore the underlying numpy.ndarray object that holds the Series' values.
A parameter is the name for an expected input to a function/method/class instantiation. An argument is the concrete value we provide for a parameter during invocation. In this lesson, we discuss the data and index parameters of the Series constructor.
A CSV is a plain text file that uses line breaks to separate rows and commas to separate row values. In this lesson, we use the pd.read_csv function to import 2 CSV datasets into pandas. We also introduce the 2-dimensional DataFrame object and learn how to convert it to a 1-dimensional Series.
The head method returns a number of rows from the beginning of the Series. The complementary tail method returns a number of rows from the end of the Series.
In this lesson, we pass a Series to Python's built-in functions including len, type, list, dict, sorted, max, and min.
In this lesson, we practice using Python's in and not in keywords to check for inclusion among the Series' values and index labels.
The sort_values method sorts a Series values in order. In this lesson, we invoke the method on both our alphabetical and numeric Series and also learn how to customize the sort type.
In this lesson, we set a custom index on our Series and learn how to sort an index using the sort_index method.
In this lesson, we use the iloc accessor to extract a Series value by its index position. iloc is short for "index location" and requires a special square bracket syntax.
In this lesson, we use the loc accessor to extract a Series value by its index label. loc requires a special square bracket syntax.
In this lesson, we introduce the get method for retrieving a Series value by index label and providing a fallback value in case the label does not exist.
In this lesson, we show the syntax to overwrite a Series value. We first target it with the iloc/loc accessor, then provide an equal sign and the value to overwrite the origin value with.
In this lesson, we discuss the differences between a copy and a view in pandas. We also learn the benefits of the copy method in creating a clone of a pandas object.
In this lesson, we run through some common mathematical methods on Series including count, sum, product, mean, max, min, median, mode, and more.
Broadcasting describes the process of applying an arithmetic operation to an array. We can combine mathematical operators with a Series to apply the mathematical operation to every value.
In this lesson, we explore the value_counts method, which returns the number of times each unique value occurs in the Series. The normalize parameter returns the relative frequencies / percentages of the values instead of the counts.
In this lesson, we use the apply method to invoke a function for every Series value. Pandas collects the results in a new Series. The advantage of apply is that we can utilize basic Python code to achieve whatever manipulation we want. If we don't know a specific Series method but can accomplish the same result with Python constructs, apply can be a useful tool.
The map method connects each Series value to a value from another data structure. It provides a mapping/connection/association/bridge to the other value. In this lesson, we practice using the method with arguments of a dictionary and a Series.
Test your knowledge of the Series concepts introduced in this section in this multiple-choice quiz.
A DataFrame is a 2-dimensional table with an index. In this lesson, we introduce this new data structure and explore some of the methods and attributes it shares with the Series object. We also identify some unique attributes that exist only on one object but not the other.
In this lesson, we do a deeper dive into the sum method and how it operates differently between Series and DataFrame objects.
In this lesson, we introduce two syntax options to extract a column from a DataFrame: attribute access and square brackets. We also discuss the tradeoffs between the two approaches. Pandas returns a view when extracting a single Series from a DataFrame.
In this lesson, we learn how to extract multiple DataFrame columns by passing a list between the square bracket extraction syntax. Pandas returns a copy/new DataFrame when extracting multiple columns.
In this lesson, we add a new column to a DataFrame using square bracket notation. We show how to populate the new Series with a single value or a dynamic calculation from performing an operation on another Series' values.
In this lesson, we quickly review the value_counts method on a Series, which counts the number of occurrences of every unique value in a Series.
In this lesson, we practice using the dropna method to remove DataFrame rows consisting of missing/NaN values. We discuss how to target rows that only hold missing values as well as rows with a missing value in a target column.
In this lesson, we explore an alternative approach for dealing with missing values: using the fillna method to populate missing values with a static value. We practice the method on both a DataFrame and a Series.
In this lesson, we introduce the astype method for converting the data types in a Series. We practice converting our floating-point columns to store integers.
The category type is helpful when a Series has a small number of unique values. In this lesson, we convert two columns in our nba DataFrame to store category values.
In this lesson, we explore the sort_values method on a DataFrame. The default sort order is ascending (smallest to greatest, alphabetical), but we can customize the order with the ascending parameter. We also discuss the na_position parameter for placing the NaN values at the beginning or end of the sorted values.
In this lesson, we sort a DataFrame by multiple columns by passing a list of column names to the by parameter. We also customize the sort order for each type by passing a list to the ascending parameter.
The sort_index method sorts a DataFrame by the index labels. In this lesson, we explore the method and a few of its parameters.
In this lesson, we learn the rank method for ordering and ranking the values in a Series. We use it to the rank our NBA players by their salaries.
Test your knowledge of this section's DataFrames concepts in this multiple choice quiz.
Welcome to the next section of the course! In this lesson, we import and introduce the new employees DataFrame. We also convert some columns to their optimal formats and introduce the to_datetime function at the top level of pandas.
To filter a DataFrame, we must first generate a Boolean Series, then pass it in square brackets after the DataFrame. In this lesson, we practice extraction using a variety of data types and operations (equality, less than, greater than, and more).
In this lesson, we introduce the & operator for combining two Boolean Series with AND logic. We use this technique to filter a subset of DataFrame rows that fit multiple conditions.
In this lesson, we introduce the | operator for combining two Boolean Series with OR logic. We use this technique to filter a subset of DataFrame rows that fit either one of several conditions. We also discuss caveats when combining & and | in the extraction syntax.
The isin method checks for each Series' value presence in a predefined list. It returns a Boolean Series if the row's value is found within the collection.
In this lesson, we discuss the isnull and notnull methods. They generate Boolean Series that validate whether a row's value is NaN or non-NaN. We apply these methods to a few examples.
In this lesson, we utilize the between method to check if each Series value exists within a range/boundary of values. We utilize the resulting Boolean Series to filter our DataFrame.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.