We may earn an affiliate commission when you visit our partners.
Jones Granatyr, Gabriel Alves, and AI Expert Academy

Within the area of Computer Vision is the sub-area of Optical Character Recognition (OCR), which aims to transform images into texts. OCR can be described as converting images containing typed, handwritten or printed text into characters that a machine can understand. It is possible to convert scanned or photographed documents into texts that can be edited in any tool, such as the Microsoft Word. A common application is automatic form reading, in which you can send a photo of your credit card or your driver's license, and the system can read all your data without the need to type them manually. A self-driving car can use OCR to read traffic signs and a parking lot can guarantee access by reading the license plate of the cars.

Read more

Within the area of Computer Vision is the sub-area of Optical Character Recognition (OCR), which aims to transform images into texts. OCR can be described as converting images containing typed, handwritten or printed text into characters that a machine can understand. It is possible to convert scanned or photographed documents into texts that can be edited in any tool, such as the Microsoft Word. A common application is automatic form reading, in which you can send a photo of your credit card or your driver's license, and the system can read all your data without the need to type them manually. A self-driving car can use OCR to read traffic signs and a parking lot can guarantee access by reading the license plate of the cars.

To take you to this area, in this course you will learn in practice how to use OCR libraries to recognize text in images and videos, all the code implemented step by step using the Python programming language. We are going to use Google Colab, so you do not have to worry about installing libraries on your machine, as everything will be developed online using Google's GPUs. You will also learn how to build your own OCR from scratch using Deep Learning and Convolutional Neural Networks. Below you can check the main topics of the course:

  • Recognition of texts in images and videos using Tesseract, EasyOCR and EAST

  • Search for specific terms in images using regular expressions

  • Techniques for improving image quality, such as: thresholding, color inversion, grayscale, resizing, noise removal, morphological operations and perspective transformation

  • EAST architecture and EasyOCR library for better performance in natural scenes

  • Training an OCR from scratch using TensorFlow and modern Deep Learning techniques, such as Convolutional Neural Networks

  • Application of natural language processing techniques in the texts extracted by OCR (word cloud and named entity recognition)

  • License plate reading

These are just some of the main topics. By the end of the course, you will know everything you need to create your own text recognition projects using OCR.

Enroll now

What's inside

Learning objectives

  • Use tesseract, east and easyocr tools for text recognition in images and videos
  • Understand the differences between ocr in controlled and natural environments
  • Apply image pre-processing techniques to improve image quality, such as: thresholding, inversion, resizing, morphological operations and noise reduction
  • Use east architecture and easyocr library for better performance in natural scenes
  • Train an ocr from scratch using deep learning and convolutional neural networks
  • Application of natural language processing techniques in the texts extracted by ocr (word cloud and named entity recognition)
  • License plate reading

Syllabus

Introduction
Course content
Introduction to OCR
Course materials
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Covers Tesseract, EasyOCR, and EAST, which are valuable tools for practitioners looking to implement OCR solutions in various applications
Includes training an OCR from scratch using TensorFlow and modern deep learning techniques, offering hands-on experience in building custom OCR solutions
Explores techniques for improving image quality, such as thresholding, color inversion, and noise removal, which are essential for effective OCR
Features the application of natural language processing techniques in the texts extracted by OCR, such as word cloud and named entity recognition, enhancing the utility of OCR output
Uses Google Colab, which eliminates the need for local library installations and leverages Google's GPUs for efficient development
Includes a project on license plate reading, which is a practical application of OCR with real-world relevance

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Practical ocr with python and deep learning

According to learners, this course offers a largely positive introduction to Optical Character Recognition in Python, covering both practical library usage like Tesseract and EasyOCR and the theory behind building an OCR system from scratch using Deep Learning. Students particularly appreciated the hands-on approach and the inclusion of image pre-processing techniques. While many found the content clear and the projects useful, some reviewers noted that parts, particularly the Deep Learning sections, may require a foundational understanding of Python and Machine Learning to follow comfortably.
Convenient online coding environment.
"Using Google Colab made environment setup incredibly easy and hassle-free."
"Appreciated the use of Colab, saved a lot of time on installations."
"The environment setup using Colab was smooth and straightforward."
"Being able to run code directly in the browser with Colab was a big plus."
Essential techniques to improve image quality.
"The coverage of image pre-processing techniques like thresholding and noise removal was excellent and necessary."
"Learned how to use crucial image enhancement methods before applying OCR."
"Techniques for improving image quality were well explained and demonstrated effectively."
"Understanding image pre-processing is key, and this course covered it thoroughly."
Understanding the deep learning backend.
"Building an OCR from scratch using CNNs was the most valuable part for me, providing deep insight."
"Loved the section where we trained a custom OCR model using TensorFlow and deep learning techniques."
"Going through the process of building from scratch helped solidify my understanding significantly."
"The deep learning component was challenging but incredibly rewarding to see come together."
Apply learned skills to practical tasks.
"The projects, especially the license plate reading and scanner, were great for applying what I learned."
"Applying the concepts in the practical projects like searching for specific terms was very beneficial."
"Practical assignments helped reinforce the concepts and build confidence."
"Project 3 (License Plate Reading) was a fantastic way to put everything together."
Learn to use key OCR libraries in Python.
"Gave me practical tools using Tesseract and EasyOCR that I can immediately apply to projects."
"The sections covering Tesseract, EasyOCR, and EAST were very clear and immediately useful."
"I really appreciated the step-by-step implementation of OCR using well-known libraries."
"The part about using EasyOCR for natural scenes was particularly helpful for real-world applications."
Certain sections assume ML/Python background.
"As a beginner, I found the deep learning section quite challenging and felt it assumed prior knowledge."
"While the practical parts were accessible, the 'from scratch' module requires some ML background."
"If you are completely new to Python or neural networks, you might need supplementary resources for some parts."
"Pace is good, but if you're not comfortable with Python, prepare for a steeper learning curve in places."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Optical Character Recognition (OCR) in Python with these activities:
Review Image Processing Fundamentals
Reinforce your understanding of image processing techniques. This will help you better grasp the pre-processing steps used in OCR.
Browse courses on Image Processing
Show steps
  • Review basic image operations like blurring, sharpening, and contrast adjustment.
  • Practice applying these operations using a library like OpenCV.
Read 'Practical Python and OpenCV'
Learn about image processing techniques using OpenCV. This will help you better understand the image pre-processing steps used in OCR.
Show steps
  • Obtain a copy of 'Practical Python and OpenCV'.
  • Read the chapters related to image pre-processing and feature extraction.
  • Experiment with the code examples provided in the book.
Experiment with Tesseract PSM modes
Master Tesseract's Page Segmentation Modes (PSM). This will allow you to optimize OCR results for different document layouts.
Show steps
  • Find a variety of images with different text layouts (single column, multiple columns, tables, etc.).
  • Run Tesseract on each image using different PSM modes.
  • Compare the results and note which PSM mode works best for each layout.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Document Image Pre-processing Pipeline
Solidify your understanding of image pre-processing by creating a well-documented pipeline. This will help you apply these techniques effectively in your own OCR projects.
Show steps
  • Choose a set of images with varying quality and characteristics.
  • Implement a Python script that applies a series of pre-processing steps (grayscale, thresholding, noise removal, etc.).
  • Document each step in the pipeline, explaining the purpose and parameters used.
  • Evaluate the impact of each step on the final OCR result.
Build a License Plate Reader
Apply your OCR skills to a real-world problem. This project will challenge you to integrate various techniques learned in the course.
Show steps
  • Gather a dataset of license plate images.
  • Implement a pipeline that detects license plates in images.
  • Use OCR to extract the text from the detected license plates.
  • Evaluate the accuracy of your license plate reader.
Read 'Deep Learning with Python'
Learn the fundamentals of deep learning. This will help you understand the concepts behind training a custom OCR model.
Show steps
  • Obtain a copy of 'Deep Learning with Python'.
  • Read the chapters related to convolutional neural networks and image classification.
  • Experiment with the code examples provided in the book.
Contribute to Tesseract or EasyOCR
Deepen your understanding of OCR by contributing to open-source projects. This will expose you to real-world challenges and best practices.
Show steps
  • Explore the Tesseract or EasyOCR GitHub repositories.
  • Identify a bug or feature request that you can contribute to.
  • Submit a pull request with your changes.

Career center

Learners who complete Optical Character Recognition (OCR) in Python will develop knowledge and skills that may be useful to these careers:
Computer Vision Engineer
A Computer Vision Engineer develops algorithms that enable computers to "see" and interpret images, and this course on Optical Character Recognition is a great way to start. This role involves designing, developing, and testing computer vision systems for various applications. With the knowledge gained from this course, you will be able to specialize in text recognition within images and videos, a key component in many computer vision projects. The course's focus on using OCR libraries, building OCR from scratch using Deep Learning and Convolutional Neural Networks, and applying techniques for improving image quality helps build a foundation in this area. The study of EasyOCR and EAST, for example, is very relevant.
Machine Learning Engineer
A Machine Learning Engineer creates and implements machine learning algorithms and systems. This course provides a foundation for this role, especially in areas involving image and text analysis. As a machine learning professional, you might work on projects such as automated document processing or image-based search. The course's coverage of training an OCR from scratch using Deep Learning and Convolutional Neural Networks is particularly valuable, along with the techniques for improving image quality. The material on natural language processing also may be useful in integrating text recognition with higher-level analysis.
Document Management Specialist
A Document Management Specialist is responsible for organizing, storing, and retrieving documents. This course on Optical Character Recognition provides the skills needed to enhance document management systems with OCR capabilities. In this role, you can implement systems that automatically convert scanned documents into editable text, making it easier to search, index, and manage documents electronically. The course's focus on using OCR libraries like Tesseract and EasyOCR, along with techniques for improving image quality, are of interest to document management.
Content Analyst
A Content Analyst analyzes and categorizes digital content. This course provides helpful skills in extracting and processing text from images, which can significantly enhance your ability to analyze image-based content. In this role, you might extract text from images and use it to identify trends, patterns, and insights. The training in image pre-processing, OCR libraries, and natural language processing will enable you to work with a wider range of content types. The sections on word clouds and named entity recognition are particularly useful for content analysis.
Software Developer
This course helps you become a Software Developer with specialized skills in OCR technology. Software developers design, code, and test software applications. With the skills learned in this course, you can develop applications that involve converting images into editable text, a capability valuable in fields like document management, data entry automation, and accessibility. The experience in using OCR libraries like Tesseract and EasyOCR, along with the ability to build your own OCR from scratch, makes you ready to develop high-quality, effective software solutions. Learning how to improve image quality for OCR is also beneficial.
Artificial Intelligence Specialist
The artificial intelligence Specialist is involved in creating and implementing AI solutions. This course may be useful as it provides specialized knowledge in the area of Optical Character Recognition, a key component in many AI applications. In this role, you might work on developing systems that can automatically extract information from images and videos, such as self-driving car technologies or automated form readers. The course's focus on Deep Learning and Convolutional Neural Networks will provide you with a solid foundation in the AI techniques needed for OCR applications.
Archivist
Archivists are responsible for appraising, collecting, organizing, preserving, and providing access to historically significant records. This course may be useful in the context of digitizing and preserving archival materials. Archivists can use Optical Character Recognition to convert scanned documents into searchable and editable text, improving accessibility and preservation. The techniques learned in this course on image pre-processing and using OCR libraries like Tesseract and EasyOCR will assist in digitizing documents. These methods improve their long-term accessibility.
Data Scientist
The Data Scientist is responsible for analyzing data to extract meaningful insights and solve complex problems. This course may be useful in the field of data science, as it provides solid skills in extracting textual data from images, which can then be used for various analytical purposes. The training in Optical Character Recognition, image pre-processing, and natural language processing will enable you to incorporate image-based data into your analyses. For example, you can extract data from scanned documents or images and use it for sentiment analysis or topic modeling. The course modules on word clouds and named entity recognition are of particular value.
Automation Engineer
An Automation Engineer designs and implements automated systems to improve efficiency and productivity. This course may be useful as it provides the skills necessary to automate tasks that involve processing images and text. You can apply this knowledge to develop systems that automatically extract data from scanned documents, automate data entry processes, or create self-service kiosks that can read and process customer information. The course's coverage of OCR libraries, image pre-processing techniques, and natural language processing provide valuable tools for automation projects.
Data Engineer
As a Data Engineer, one is responsible for building and maintaining the infrastructure for data storage and processing. This course on Optical Character Recognition helps you develop the skills needed to handle image-based data. You can apply this knowledge to build pipelines that automatically extract text from images and videos, store the extracted data, and make it available for analysis. The course content on improving image quality and using OCR libraries like Tesseract and EasyOCR helps you create efficient and reliable data processing systems. The license plate reading section is also of interest.
Data Analyst
Data Analysts examine data to identify trends, develop charts, and create reports. The skills taught in this course may be useful in broadening the scope of data analysis to include image-based data. You can use the skills learned in this course to extract data from images, preprocess it, and then analyze it using traditional data analysis tools. The course's coverage of OCR libraries, image pre-processing techniques, and natural language processing helps you derive insights from image-based sources. The sections on searching for specific terms and named entity recognition are of particular interest.
Research Scientist
A Research Scientist investigates and develops new technologies and solutions, and this course strengthens research capabilities in image and text processing. As a researcher, you might explore innovative ways to improve OCR accuracy, develop new algorithms for text recognition, or integrate OCR with other AI technologies. The course's training on building OCR from scratch using Deep Learning helps build a strong theoretical and practical foundation. The study of techniques for image pre-processing and the use of neural networks can be valuable in conducting cutting-edge computer vision research. An advanced degree such as a PhD is often required for this role.
Quality Assurance Engineer
A Quality Assurance Engineer is responsible for ensuring the quality and reliability of software products. This course on Optical Character Recognition may be useful in testing and validating OCR-based applications. You can use the skills learned in this course to design and execute test cases, identify bugs, and verify that the software meets the required standards. The course's coverage of OCR libraries, image pre-processing techniques, and training a custom OCR helps you evaluate the accuracy and performance of OCR systems. The sections on testing and evaluating neural networks is also of benefit.
Business Intelligence Analyst
A Business Intelligence Analyst analyzes data to provide insights, trends, and predictions to help organizations make better decisions. This course provides may be useful skills in handling image-based data, which can then be integrated into business intelligence reports and dashboards. You can apply this knowledge to extract data from scanned documents, images, and videos and use it to generate valuable business insights. The course's coverage of OCR libraries, image pre-processing techniques, and natural language processing will aid in data extraction and analysis.
Robotics Engineer
Robotics Engineers design, develop, and test robots for various applications. This course may be useful in roles that require integrating computer vision, such as OCR, into robotic systems. You will understand how to enable robots to “read” and interpret text in their environment, such as reading labels, signs, or instructions. The course's focus on using OCR libraries like Tesseract and EasyOCR, along with training a custom OCR using Deep Learning, demonstrates the knowledge needed to implement OCR in robotic applications. The license plate example is relevant.

Reading list

We've selected two books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Optical Character Recognition (OCR) in Python.
Provides a comprehensive introduction to image processing and computer vision using Python and OpenCV. It covers many of the image pre-processing techniques used in OCR, such as thresholding, noise removal, and morphological operations. It valuable resource for understanding the practical aspects of image manipulation and will greatly enhance your ability to improve OCR accuracy. This book is commonly used as a textbook at academic institutions.
Provides a comprehensive introduction to deep learning using Python and Keras. While not specifically focused on OCR, it provides the necessary background to understand and implement custom OCR models using convolutional neural networks. It is more valuable as additional reading to expand your knowledge of deep learning. This book is commonly used as a textbook at academic institutions.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser