We may earn an affiliate commission when you visit our partners.
Course image
Chaz Henry

In the 2006 playoffs, Major League Baseball debuted a pitch tracking camera system called PitchF/x. Now installed in every MLB stadium, the system has been continually extended and re-branded. From cameras to TrackMan radar, from StatCast, to GameDay – MLB now tracks every pitch and every player's movement on each pitch. The data are made public on the MLB web site and SaberMetricians world-wide pour over every detail. The teams themselves, average five or more statisticians dedicated to analyzing the data to aid in selecting and improving players.

Read more

In the 2006 playoffs, Major League Baseball debuted a pitch tracking camera system called PitchF/x. Now installed in every MLB stadium, the system has been continually extended and re-branded. From cameras to TrackMan radar, from StatCast, to GameDay – MLB now tracks every pitch and every player's movement on each pitch. The data are made public on the MLB web site and SaberMetricians world-wide pour over every detail. The teams themselves, average five or more statisticians dedicated to analyzing the data to aid in selecting and improving players.

I'm Chaz Henry – a software engineer, 12 year little league coach and founder of the PowerChalk dot com website. In this class, we're going to open a fresh Jupyter Notebook, grab the MLB game data from Clayton Kershaw's 2014 no-hitter and wrangle that data in Python. It's an introduction in SaberMetrics - the empirical study of baseball statistics.

We'll use built-in Python libraries and graph the pitches with MatPlotLib and PyPlot. Along the way we'll talk about best practices for Jupyter Notebook, Python coding, XML parsing and maybe a little baseball.

So, if you're a coder, a SaberMetrician or a just a baseball fan who wants to peek behind the curtain at what's driving MoneyBall and the next wave of player development, sign up for the course and let's start scrubbing the pitch data from one of the greatest pitching performances in MLB history.

Enroll now

What's inside

Learning objectives

  • How to find mlb game and pitch data in gameday.
  • How to create and program a jupyter notebook in python.
  • How to extract xml pitch data from the mlb website.
  • How to coerce xml tree data into a pandas dataframe.
  • How to extract dataframe slices into multiple views.
  • How to plot pitch data with matplotlib and pyplot graphs.
  • Adding data columns to a pandas dataframe.
  • Plotting pitch tendency as pie charts (by ball-strike count).

Syllabus

More Pandas Dataframe
Plotting
Plotting Line/Scatter
Dataframe Slices
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Uses Pandas DataFrames, Matplotlib, and Pyplot, which are essential tools for data analysis and visualization in Python, making it highly relevant for aspiring data scientists
Explores SaberMetrics, the empirical study of baseball statistics, offering baseball fans a deeper understanding of player performance and game strategy through data analysis
Provides an introduction to Jupyter Notebooks and Python coding practices, which can help beginners develop a solid foundation in programming and data manipulation
Focuses on XML parsing and data coercion into Pandas DataFrames, which are valuable skills for handling real-world data from various sources and formats
Analyzes data from Clayton Kershaw's 2014 no-hitter, offering a practical and engaging example of how data analysis can be applied to understand specific baseball events
Requires learners to install and run Jupyter Notebook, which may require some familiarity with software installation and environment configuration, potentially posing a hurdle for some

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Analyzing mlb baseball data with python

According to learners, this course provides a practical and engaging introduction to wrangling Major League Baseball data using Python. Students particularly appreciate the use of real-world PitchF/x data, specifically from Clayton Kershaw's no-hitter, which provides a concrete and interesting case study. The instructor's passion for both baseball and coding is frequently highlighted. However, some reviewers note that the course assumes some prior knowledge of Python and Pandas, and recent reviews mention that the method for accessing the data source (Gameday XML) can be unstable or outdated, requiring extra effort outside the course material. Despite potential challenges with data access, the fundamental data wrangling and plotting techniques taught remain valuable for those interested in baseball analytics.
Uses an iconic MLB game for analysis.
"Analyzing Kershaw's no-hitter was a fantastic and motivating example."
"Using a real, famous game made the abstract concepts tangible."
"The chosen dataset was perfect for illustrating the techniques."
"The focus on a specific, interesting game made it highly relevant."
Instructor's passion for topic is evident.
"The instructor's knowledge and enthusiasm for baseball and coding are infectious."
"Chaz does a great job blending baseball passion with technical instruction."
"His excitement for the data makes the learning process more enjoyable."
"You can tell the instructor loves this subject matter."
Focuses on real-world MLB data analysis.
"This course provided me with a solid foundation for working with real baseball data."
"I really enjoyed the hands-on approach using an actual MLB game's data."
"The practical exercises helped solidify the data wrangling concepts."
"Applying Python to analyze pitch data was exactly what I was looking for."
Assumes basic Python/Pandas familiarity.
"While described as an intro, it definitely helps to have some Python and Pandas experience."
"The pace is quick if you're totally new to Jupyter notebooks or Pandas dataframes."
"I struggled a bit without a stronger background in Python programming."
"Recommended for those who already know Python basics."
Accessing current data can be challenging.
"A major hurdle is the Gameday XML feed changing, which breaks the code taught."
"I spent significant time troubleshooting issues with the data source not being available as shown."
"The methods for data retrieval seem outdated and no longer consistently work."
"Need to find alternative ways to get the data because the course's method is unreliable now."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Wrangling Major League Baseball Pitchf/x Data with Python with these activities:
Review Baseball Statistics Fundamentals
Reinforce your understanding of fundamental baseball statistics to better interpret the Pitchf/x data.
Show steps
  • Read articles or watch videos explaining key baseball statistics concepts.
  • Solve practice problems related to calculating and interpreting these statistics.
Brush Up on Python Fundamentals
Review Python basics, especially Pandas, Matplotlib, and XML parsing, to ensure a smooth learning experience.
Show steps
  • Complete online tutorials or coding exercises on Python fundamentals.
  • Practice working with Pandas DataFrames and Matplotlib plots.
  • Familiarize yourself with XML parsing libraries in Python.
Read 'Moneyball' by Michael Lewis
Understand the context and impact of sabermetrics in baseball, as popularized by 'Moneyball'.
Show steps
  • Read 'Moneyball' by Michael Lewis.
  • Reflect on how the concepts in the book relate to the course content.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Practice Pandas DataFrame Manipulation
Reinforce your ability to manipulate and analyze data using Pandas DataFrames.
Show steps
  • Download sample baseball datasets from the internet.
  • Practice loading, cleaning, and transforming the data using Pandas.
  • Perform common data analysis tasks, such as filtering, grouping, and aggregating data.
Write a Blog Post on a Specific Pitch Type
Deepen your understanding of pitch types and their characteristics by researching and writing about one in detail.
Show steps
  • Choose a specific pitch type (e.g., fastball, curveball, slider).
  • Research the characteristics, mechanics, and usage of the pitch.
  • Write a blog post summarizing your findings, including visualizations of pitch data.
Analyze Pitch Data for a Different MLB Player
Apply the skills learned in the course to analyze pitch data for a different MLB player and compare their pitching tendencies to Clayton Kershaw.
Show steps
  • Select an MLB player and gather their pitch data from MLB Gameday.
  • Adapt the code from the course to process and analyze the player's data.
  • Compare the player's pitching tendencies to Clayton Kershaw's, highlighting similarities and differences.
The Book: Playing the Percentages in Baseball
Gain a deeper understanding of baseball strategy and decision-making through a sabermetric lens.
Show steps
  • Read 'The Book: Playing the Percentages in Baseball'.
  • Consider how the concepts in the book can be applied to analyze pitch data and inform pitching strategy.

Career center

Learners who complete Wrangling Major League Baseball Pitchf/x Data with Python will develop knowledge and skills that may be useful to these careers:
Baseball Operations Analyst
A Baseball Operations Analyst supports the baseball operations department of a professional team by providing data-driven insights and analytical support. This role involves data collection, analysis, and reporting to aid in player evaluation, game strategy, and decision-making. This course gives an introduction to the kind of work that a Baseball Operations Analyst would be responsible for. Its focus on extracting, cleaning, and plotting MLB pitch data directly translates to tasks this role would perform. If you want to be a Baseball Operations Analyst, this course will help get you there.
Baseball Data Analyst
A Baseball Data Analyst scrutinizes baseball statistics to glean insights that boost team performance. This involves collecting, processing, and interpreting data related to player performance, game strategy, and opponent analysis. The course directly helps to perform the functions of a Baseball Data Analyst, as it covers extracting baseball game data from MLB, manipulating it with Python, and visualizing it using Matplotlib and Pyplot. If you wish to learn how to wrangle baseball data, you should take this course.
Sports Statistician
A Sports Statistician applies statistical methods to analyze sports data, providing insights to improve team strategies, player performance, and overall decision-making. This role often involves data collection, modeling, and the presentation of findings to coaches, players, and management. This course provides a practical introduction to wrangling and visualizing baseball statistics using Python, aiding in the development of skills essential for a Sports Statistician. The course's focus on extracting, cleaning, and plotting MLB pitch data directly translates to tasks this role would perform. By taking this course, you will learn how to analyze baseball data.
Sabermetrician
A Sabermetrician applies data-driven analysis to baseball, using statistics to evaluate players, predict outcomes, and optimize strategies. This role requires skills in data collection, statistical modeling, and effective communication of findings to baseball operations staff. Given that this course is an introduction to Sabermetrics, it provides a strong foundation for the role. The course includes practical exercises in extracting and manipulating MLB pitch data and graphing pitches. If you want to become a Sabermetrician, this is a great starting point.
Sports Analyst
A Sports Analyst is a broad term for professionals who analyze sports data to provide insights for various purposes, such as improving team performance, informing betting strategies, or creating engaging content for fans. This course can help with the statistical analysis of baseball data using Python. The course offers practical experience in extracting, manipulating, and presenting MLB pitch data, which makes it valuable for a Sports Analyst aiming to specialize in baseball. If you want to be a Sports Analyst specializing in baseball, this is a great jumping off point.
Baseball scout
A Baseball Scout evaluates baseball players and assesses their potential for professional play. Traditionally, this role involved observation and subjective assessment, but now also incorporates data-driven analysis. This course can help enhance their ability to interpret statistical data and make informed decisions. Learning how to extract, manipulate, and visualize pitch data empowers a Baseball Scout to identify hidden trends and patterns in player performance. Knowledge of Sabermetrics is increasingly valuable in modern scouting. By taking this course, you can add data driven analysis skills to your baseball scouting abilities.
Data Visualization Specialist
A Data Visualization Specialist focuses on creating compelling and informative visualizations from complex datasets. This involves using tools and techniques to communicate data insights effectively to different audiences. This course includes practical exercises in creating various types of plots and charts using Matplotlib and Pyplot in Python. These skills are essential for a Data Visualization Specialist. By taking this course, you will learn concrete ways to visualize baseball data.
Data Scientist
A Data Scientist gathers and interprets data to solve complex business problems. They use various tools and techniques, including statistical analysis, machine learning, and data visualization. This course may be useful for you, as it provides hands-on experience in data manipulation and visualization using Python and popular libraries. Learning how to extract meaningful insights from MLB game data can be a valuable addition to your toolkit as a Data Scientist. The course's emphasis on wrangling real-world data with Python helps build a strong foundation in the field. To be a Data Scientist, this course may be a good first step.
Performance Analyst
A Performance Analyst evaluates the performance of athletes and teams to provide insights for improvement. This role often involves collecting and analyzing data, identifying trends, and communicating findings to coaches and players. The course provides hands-on experience in data manipulation and visualization using Python. The focus on extracting and analyzing MLB pitch data directly translates to tasks this role would perform. If you want to be a Performance Analyst, specializing in baseball data, you should take this course.
Data Engineer
A Data Engineer designs, builds, and manages data pipelines and infrastructure that transform and transfer data to data scientists and other users. This role requires skills in data extraction, transformation, and loading. This course may be useful for aspiring Data Engineers, as it covers data extraction from the MLB website, transformation into Pandas DataFrames, and basic data manipulation using Python. This hands-on experience is relevant to the data wrangling aspects of a Data Engineer's responsibilities. This course can help those who want to be Data Engineers.
Machine Learning Engineer
A Machine Learning Engineer develops and deploys machine learning models. This role requires a strong understanding of data preprocessing, feature engineering, and model evaluation. This course may be useful as it provides hands-on experience in data wrangling, visualization, and feature extraction using Python, which are essential skills for a Machine Learning Engineer. The course covers turning raw XML data into usable Pandas DataFrames, a common task in machine learning pipelines. If you want to become a Machine Learning Engineer, this course may be helpful.
Software Developer
A Software Developer designs, develops, and tests software applications. While this course is specific to baseball data, it provides valuable experience in data manipulation, visualization, and working with APIs using Python, which are transferable skills for any Software Developer. This course also covers Jupyter Notebooks including installation and running, which is very helpful. The course's focus on practical coding exercises makes it relevant for those looking to enhance their programming skills. If you are a Software Developer wanting to learn more about baseball data, this course may be useful.
Sports Blogger
A Sports Blogger creates written content about sports for online platforms. While strong writing is essential, incorporating data-driven insights can enhance credibility and engagement. This course can enhance your analytical skills. Learning how to extract, manipulate, and visualize baseball data using Python empowers a Sports Blogger to create more informed and compelling content. The course helps build a foundation for data-driven storytelling in the sports domain. By taking this course, you can elevate your baseball blog.
Quantitative Analyst
A Quantitative Analyst, often working in finance or related fields, uses mathematical and statistical methods to analyze data and develop models for decision-making. This course can help with this career, as it teaches data manipulation using Python. Learning how to extract, clean, and analyze baseball data provides experience in data analysis techniques that are applicable to quantitative analysis roles in various domains. To be a Quantitative Analyst, this course may be a good first step.
Baseball General Manager
The General Manager (GM) is the top baseball executive who makes all the player personnel decisions. Although this position may require a Master's Degree in Business Administration with a concentration in sports management, in today's world, data acumen is extremely valuable. This course touches on data literacy and how to find, parse, massage, and analyze baseball data. Learning how to extract, manipulate, and visualize baseball data using Python empowers the GM to make more informed decisions. By taking this course, you can elevate your baseball knowledge.

Reading list

We've selected two books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Wrangling Major League Baseball Pitchf/x Data with Python.
Delves into the strategic decision-making in baseball from a sabermetric perspective. It uses data analysis to evaluate conventional wisdom and identify optimal strategies. It provides a deeper understanding of how data can be used to improve team performance. This book is commonly used as a textbook at academic institutions or by industry professionals.
Provides context for the use of data in baseball decision-making. It illustrates how statistical analysis can be used to identify undervalued players and build a winning team. While not a technical manual, it offers valuable insights into the philosophy behind sabermetrics and its impact on the sport. This book is more valuable as additional reading than it is as a current reference.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser