We may earn an affiliate commission when you visit our partners.
Course image
Course image
edX logo

Python and Pandas for Data Engineering

Kennedy Behrman and Noah Gift

In this course, you'll gain the Python and Pandas skills essential for data engineering:

Read more

In this course, you'll gain the Python and Pandas skills essential for data engineering:

  • Set up version-controlled Python environments with necessary libraries
  • Write Python programs using key language features and data structures
  • Manipulate and analyze data using the powerful Pandas library
  • Explore alternative data structures like NumPy arrays and PySpark DataFrames
  • Utilize Vim, Visual Studio Code, and Git for productive development

Whether you're a beginner or have some programming experience, you'll learn to harness Python and Pandas to tackle data engineering challenges. Hands-on exercises reinforce your learning each step of the way.

What's inside

Learning objectives

  • Python environment setup and package management
  • Core python syntax and data structures
  • Pandas dataframes for data manipulation
  • Alternatives to pandas for big data
  • Development with vim, vs code, and git

Syllabus

Module 1: Getting Started with Python (14 hours)
\- Overview of Python, Bash and SQL Essentials for Data Engineering (video, 7 minutes)
\- Meet your Course Instructor: Kennedy Behrman (video, 0 minutes)
Read more
\- Overview of Key Concepts (video, 5 minutes)
\- Introduction to Setting Up Your Python Environment (video, 0 minutes)
\- Installing Packages with pip in Python (video, 6 minutes)
\- Saving Requirements File in Python (video, 3 minutes)
\- Creating and Using a Python Virtual Environment (video, 5 minutes)
\- Expression Statements in Python (video, 3 minutes)
\- Assignment Statements in Python (video, 5 minutes)
\- Import Statements in Python (video, 4 minutes)
\- Other Simple Statements in Python (video, 5 minutes)
\- Compound Statements in Python (video, 5 minutes)
\- If Statements in Python (video, 6 minutes)
\- While Loops in Python (video, 4 minutes)
\- Functions in Python (video, 7 minutes)
\- Key Terms (reading, 10 minutes)
\- Meet your Supporting Instructors: Alfredo Deza and Noah Gift (reading, 10 minutes)
\- Course Structure and Discussion Etiquette (reading, 10 minutes)
\- Getting Started and Best Practices (reading, 10 minutes)
\- Lesson Reflection (reading, 10 minutes)
\- Evaluating to True or False (reading, 10 minutes)
\- Python Statements (quiz, 30 minutes)
\- Assignment Statements (quiz, 30 minutes)
\- Import Statements (quiz, 30 minutes)
\- If Statements (quiz, 30 minutes)
\- While Loops (quiz, 30 minutes)
\- Quiz-Setting Up Your Python Environment (assignment, 180 minutes)
\- Meet and Greet (optional) (discussion prompt, 10 minutes)
\- Install a Package with the pip Command (ungraded lab, 60 minutes)
\- Export a Requirements File (ungraded lab, 60 minutes)
\- Create a Virtual Environment (ungraded lab, 60 minutes)
\- Practicing with Expression Statements (ungraded lab, 60 minutes)
\- Decorator Functions (ungraded lab, 60 minutes)
\- Setting up a Python Environment (ungraded lab, 60 minutes)
****
Module 2: Essential Python (11 hours)
- Introduction to Python Essentials (video, 0 minutes)
- Sequences in Python (video, 8 minutes)
- Lists and Tuples in Python (video, 5 minutes)
- Strings in Python (video, 10 minutes)
- Creating Range Objects in Python (video, 2 minutes)
- Creating Dictionaries in Python (video, 4 minutes)
- Accessing Dictionary Data in Python (video, 3 minutes)
- Dictionary Views in Python (video, 2 minutes)
- Sets and Set Operations in Python (video, 6 minutes)
- List Comprehensions in Python (video, 6 minutes)
- Generator Expressions in Python (video, 4 minutes)
- Generator Functions in Python (video, 7 minutes)
- Key Terms (reading, 10 minutes)
- Lesson Reflection (reading, 10 minutes)
- Lesson Reflection (reading, 10 minutes)
- Setting Up Visual Studio Code (video, 2 minutes)
- Debugging Visual Studio Code (video, 3 minutes)
- Essential Python Concepts (quiz, 30 minutes)
- Sequence Operations (quiz, 30 minutes)
- Lists and Tuples (quiz, 30 minutes)
- Range Objects (quiz, 30 minutes)
- Accessing Data in Dictionaries (quiz, 30 minutes)
- Sets and Set Operations (quiz, 30 minutes)
- List Comprehensions (quiz, 30 minutes)
- Generator Expressions (quiz, 30 minutes)
- Practicing with Strings in Python (ungraded lab, 60 minutes)
- Creating Dictionaries in Python (ungraded lab, 60 minutes)
- Dictionary Views in Python (ungraded lab, 60 minutes)
- Comprehensions and Generators in Python (ungraded lab, 60 minutes)
- Practicing Essential Python (ungraded lab, 60 minutes)
Module 3: Data in Python: Pandas and Alternatives (12 hours)
- Introduction to Data in Python: Pandas and Alternatives (video, 0 minutes)
- Creating Pandas DataFrames in Python (video, 4 minutes)
- Investigating Data in a Pandas DataFrame (video, 6 minutes)
- Selecting Data in a Pandas DataFrame (video, 6 minutes)
- Manipulating Pandas DataFrames (video, 4 minutes)
- Updating Pandas DataFrame Data (video, 5 minutes)
- Applying Functions in a Pandas DataFrame (video, 6 minutes)
- Creating NumPy Arrays in Python (video, 15 minutes)
- Spark and PySpark DataFrames in Python (video, 6 minutes)
- Creating Dask DataFrames in Python (video, 6 minutes)
- What is Version Control? (video, 3 minutes)
- Polars (reading, 10 minutes)
- Features of Visual Studio Code (quiz, 30 minutes)
- Pandas and Alternatives (quiz, 30 minutes)
- NumPy (quiz, 30 minutes)
- PySpark (quiz, 30 minutes)
- Dask (quiz, 30 minutes)
- Creating DataFrames (ungraded lab, 60 minutes)
- Looking at Data in DataFrames (ungraded lab, 60 minutes)
- Selecting Data in a Pandas DataFrame (ungraded lab, 60 minutes)
- Manipulating DataFrames (ungraded lab, 60 minutes)
- Updating Data in a DataFrame (ungraded lab, 60 minutes)
- Applying Functions in a Pandas DataFrame (ungraded lab, 60 minutes)
- Manipulate DataFrames with Polars to gain insights (ungraded lab, 60 minutes)
- Pandas and Alternatives (ungraded lab, 60 minutes)
Module 4: Python Development Environments (13 hours)
- Introduction to Python Development Environments (video, 0 minutes)
- Introduction to Vim Normal Mode (video, 6 minutes)
- Switching from Normal to Insert and Visual Modes in Vim (video, 4 minutes)
- Working with the Vim Command Line (video, 6 minutes)
- Vim Configuration (video, 3 minutes)
- Introduction to Visual Studio Code (video, 1 minute)
- Introduction to Git and Git Concepts (video, 7 minutes)
- Version Control with GitHub (video, 6 minutes)
- Summary of Python and Pandas for Data Engineering (video, 0 minutes)
- Version Control (quiz, 30 minutes)
- Key Terms (reading, 10 minutes)
- Next Steps (reading, 10 minutes)
- Cumulative Python and Pandas for Data Engineering Quiz (quiz, 45 minutes)
- Insert and Visual Modes (quiz, 30 minutes)
- Vim Command Line Mode (quiz, 30 minutes)
- Git Commands (quiz, 30 minutes)
- Hosted Git (quiz, 30 minutes)
- Basic Vim Commands (ungraded lab, 60 minutes)
- Explore Visual Studio Code (ungraded lab, 60 minutes)
- Visual Studio Code Debugger (ungraded lab, 60 minutes)
- Setup and Provision a Python Project (ungraded lab, 60 minutes)
- Pandas Final Challenge: Life Expectancy and Happiness (ungraded lab, 60 minutes)
- Final Jupyter Sandbox (ungraded lab, 60 minutes)
- Final VS Code Sandbox (ungraded lab, 60 minutes)
- Final Sandbox Linux Desktop (ungraded lab, 60 minutes)

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Explores Python and Pandas for Data Engineering, which are standard tools for data engineers
Beginners may need to take other courses first, as this has few explicit prerequisites
Students build a strong foundation in Python syntax, which is helpful in other areas of tech
Learners get hands-on experience with real-world data engineering tools like Pandas
This course is a great choice for learners who want to enter the field of data engineering

Save this course

Save Python and Pandas for Data Engineering to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Python and Pandas for Data Engineering with these activities:
Organize and Review Course Materials
Ensure a strong foundation for success in the course by organizing and reviewing all available materials.
Show steps
  • Gather all course materials, including syllabus, readings, assignments, and quizzes
  • Create a system for organizing these materials, such as folders or digital notebooks
  • Regularly review the materials to identify areas for further study or clarification
Review Core Python Syntax and Data Structures
Strengthen the foundation by reviewing the basics of Python syntax and data structures before starting the course.
Browse courses on Python
Show steps
  • Go through online resources or textbooks to refresh your understanding of variables, data types, operators, and control flow
  • Practice writing simple Python programs to reinforce your knowledge
Solve LeetCode Problems on Python Lists and Dictionaries
Enhance problem-solving skills and reinforce concepts of Python lists and dictionaries by practicing on LeetCode, a platform for coding challenges.
Browse courses on Python
Show steps
  • Choose a set of LeetCode problems that focus on Python lists and dictionaries
  • Solve the problems and review the solutions to understand different approaches and techniques
  • Identify patterns and common operations related to lists and dictionaries
Five other activities
Expand to see all activities and additional details
Show all eight activities
Review 'Hands-On Data Analysis with Pandas' by Wes McKinney
Build a strong understanding of the fundamentals of data analysis with Pandas, the essential library for data manipulation and analysis in Python.
Show steps
  • Read the first three chapters of the book to gain an overview of Pandas and its capabilities
  • Complete the exercises at the end of each chapter to reinforce your understanding
  • Create a small data analysis project using Pandas to apply your skills
Follow a Tutorial on PySpark DataFrames
Gain familiarity with PySpark DataFrames, a powerful tool for handling large-scale data, by following an online tutorial.
Browse courses on Pyspark
Show steps
  • Find a comprehensive tutorial on PySpark DataFrames
  • Follow the steps in the tutorial to create, manipulate, and analyze a DataFrame
  • Experiment with the DataFrame API to explore different operations and functions
Participate in a Study Group for Python Development Environments
Engage with peers, exchange knowledge, and solidify understanding of Python development environments by joining a study group.
Browse courses on Python
Show steps
  • Find or create a study group with other participants taking the course
  • Meet regularly to discuss course topics, share resources, and practice using Python development environments
  • Collaborate on projects or assignments to gain hands-on experience
Build a Data Preprocessing Pipeline
Develop hands-on experience with data preprocessing, a crucial step in data engineering to prepare data for analysis and modeling.
Browse courses on Data Preprocessing
Show steps
  • Identify a dataset to work with, such as a CSV file or a database table
  • Load the dataset into a Pandas DataFrame
  • Apply data cleaning techniques to handle missing values and outliers
  • Transform the data into the desired format for analysis or modeling
  • Save the preprocessed data to a file or database
Create a Tutorial on Using NumPy Arrays
Demonstrate a deep understanding of NumPy arrays, a fundamental data structure for numerical operations in Python, by creating a tutorial for others.
Browse courses on NumPy
Show steps
  • Explain the basics of NumPy arrays, including their creation, indexing, and slicing
  • Show how to perform common operations on arrays, such as addition, subtraction, and multiplication
  • Discuss the advantages and disadvantages of using NumPy arrays compared to other data structures

Career center

Learners who complete Python and Pandas for Data Engineering will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Python and Pandas for Data Engineering.
Python and Pandas for Data Engineering
Cleaning and Working with Dataframes in Python
Index Objects with Pandas
Python Basics for Data Science
Web Applications and Command-Line Tools for Data...
Data Analysis in Python: Using Pandas DataFrames
Pandas for Data Science
Guided Project: Secure Analysis of a Credit Card Dataset...
Guided Project: Secure Analysis of a Credit Card Dataset
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser