Machine Learning

Getting Started with Optical Character Recognition

Optical character recognition is a technology where each character on a page is scanned individually so that your text is uploaded as text documents and not images.  EasyOCR is a python package that makes it easy to perform optical character recognition. We'll use it to extract text from images. We will be able to visualize these results using OpenCV.

Prerequisites

To follow along with this tutorial, you need to:

Be familiar with Machine Learning modeling.
Use either Jupyter Notebook or Google Colab.

We will use Google Colab for this tutorial.

Installing and importing dependencies
Reading our image
Using EasyOCR to extract text from our image
Visualizing results using the OpenCV library
Handling multiple lines
Wrapping up
Further reading

Installing and importing dependencies

The first dependency that we will need to install is PyTorch as EasyOCR runs on top of the PyTorch library. To install PyTorch, we need to head on to their main website. Select your preference and an installation code will be generated. For our case, we've selected the Stable (1.10.1) PyTorch build, Linux OS, Pip package, Python language, and CUDA 10.2 compute platform. The following code is generated after selecting those preferences.

!pip3 install torch torchvision torchaudio

EasyOCR is the second dependency that we will need to install.

pip install easyocr

We now have PyTorch and EasyOCR installed. The next thing that we need to do is to import our dependencies into our notebook.

import cv2
import easyocr
import numpy as np
from matplotlib import pyplot as plt

We've imported four things:

EasyOCR is the main package that we will use to perform optical character recognition.
OpenCV as cv2. It will help us import our image and visualize it.
Matplotlib also helps in visualization.
Numpy to help perform mathematical calculations.

We now need to read in our images. We've downloaded two images from Unsplash. These images are:

Feel free to use any image you wish.

Reading our image

from google.colab.patches import cv2_imshow

image = cv2.imread("image-one.jpg")
cv2_imshow(image)

Now that our image is loaded onto our notebook, let's use EasyOCR to go ahead and perform optical character recognition.

Using EasyOCR to extract text from our image

First, we need to pass in the easyocr reader and pass in the language that we want to use. In our case, that'll be English.

Secondly, using the ocr_reader, we pass in the readtext command and pass in our image. We save these results in a variable called results.

ocr_reader = easyocr.Reader(['en'])
results = ocr_reader.readtext(image)
results

Results:

([[121.756503204768, 455.2312153828863],
   [389.1020796120134, 432.6470713866934],
   [389.24349679523203, 524.7687846171136],
   [122.8979203879866, 547.3529286133066]],
  'GOOD',
  0.5394189953804016),
 ([[126.45506700720357, 542.3292546273735],
   [389.0906482289428, 511.83345289506786],
   [393.5449329927964, 599.6707453726265],
   [130.90935177105715, 630.1665471049322]],
  'NEWS',
  0.993106484413147),
 ([[190.6717988226486, 618.007698233973],
   [392.5179510077811, 588.6436857858664],
   [398.3282011773514, 642.992301766027],
   [196.48204899221892, 672.3563142141336]],
  'COMING',
  0.9999751310992928)]

After applying EasyOCR on the image, we can see that it has been able to extract the text from the image with a good confidence value. The different values indicate the coordinates where our text is in the image.

Visualizing results using the OpenCV library

Let's begin by defining a couple of key variables to determine where our different coordinates are. We'll use the OpenCV library for this task.

Let's set our coordinate variable.

top_left = tuple(results[0][0][0])
bottom_right = tuple(results[0][0][2])
text = results[0][1]
font = cv2.FONT_HERSHEY_PLAIN

We began by defining a variable for our top_left coordinate. We've converted it into a tuple because when we pass it to OpenCV, it's expecting a tuple. We've done a similar thing with the bottom_right variable. We grab that text and put it into a variable known as text. We've also gone ahead and defined the OpenCV font that we're going to use. To learn about more OpenCV text fonts, please refer to this article.

Let's now go ahead and visualize it.

img = cv2.imread("image-one.jpg")
img = cv2.rectangle(img,top_left,bottom_right,(0,255,0),3)
img = cv2.putText(img,text,top_left, font, 0.5,(255,255,255),2,cv2.LINE_AA)
plt.imshow(img)
plt.show()

Box

Though small, we can see that a green bounding box has been drawn on the top right text. That's optical character recognition in a nutshell. Currently, it's only able to handle a single line of text. What happens if we have an image with multiple lines of text?

Let's try and make EasyOCR handle multiple lines of the extracted text.

Handling multiple lines

Handling it is all the same as in the previous reader code, what changes is how we go ahead and visualize it. We need to loop through to visualize the other texts.

As a side note, it will be taking a little longer to process as we are now working with multiple texts.

image = cv2.imread("image-one.jpg")
spacer = 100
for detection in results: 
    top_left_detection = tuple([int(val) for val in detection[0][0]])
    bottom_right_detection = tuple([int(val) for val in detection[0][2]])
    text = detection[1]
    image = cv2.rectangle(img,top_left_detection,bottom_right_detection,(0,255,0),3)
    image = cv2.putText(image,text,(20,spacer), font, 0.5,(0,255,0),2,cv2.LINE_AA)
    spacer+=15
    
plt.imshow(img)
plt.show()

Result:

Multiple lines

You can find the complete code for this tutorial here.

Wrapping up

We've done quite a fair bit in this tutorial. We started by installing and importing our dependencies, we read our image using the EasyOCR reader, we drew our results using OpenCV, and finally took a look at how we can handle different detections on multiple lines. Hopefully, you found this tutorial useful.

Happy coding!