Drug Label Extraction using Deep Learning
Optical Character Recognition (OCR) uses optics to extract readable text into machine-encoded text. A large number of companies that process paper-based forms use OCR to extract texts from documents. <!--more--> Applying cutting-edge technologies to modern problems has enabled various problems to be solved in healthcare, thereby improving people's lives.
Machine learning has been used in hospitals to identify patients faster in disease identification and diagnosis, robotic surgery, and medical imaging diagnosis.
One notable area of concern in healthcare that hasn't been explored a lot is drug label extraction. Optical Character Recognition (OCR) uses optics to extract readable text into machine-encoded text.
OCR has been used on Google's visual translation service. Recently, most companies offering PDF services, such as Adobe, also provide this feature on their app.
We will be using OCR to enable us to automatically extract drug labels/text from prescription medicine.
We'll specifically use the PaddleOCR, previously referenced in a paper known as PP-OCR: A Practical Ultra Lightweight OCR System. The PaddleOCR is a model that the engineers at Baidu originally built.
Prerequisites
To understand this tutorial, you will need the following:
- Have Google Colab or Jupyter Notebook installed.
- You will need to be familiar with machine learning.
- You will need to have Python and PaddleOCR installed.
Table of contents
- Installing PaddleOCR for Python
- Applying PaddleOCR model to images
- Extracting prescription medication labels using PaddleOCR
- Wrapping up
To go through with our drug label extraction, we need to perform three key things:
- Install and import our dependencies.
- Instantiate the model and detect.
- Visualize the results.
Installing PaddleOCR for Python
We begin with opening a Jupyter notebook either on Colab or locally and install the required libraries. The dependency we want to install is the PaddleOCR.
Head on to Paddle OCR's Github page.
Under the Tutorial
section, navigate to the installation
sub-section. A thorough guide to install PaddleOCR is available here.
There are two parts to the installation process. First, you have to install paddlepaddle
, which is the underlying framework behind the PaddleOCR model. Second, we'll go ahead and install the paddleOCR
.
The PaddleOCR model can run on both a GPU and CPU. GPUs are often preferred because of their greater memory bandwidth.
If you have a GPU-enabled machine, use the following command to install the PaddleOCR:
!python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple
If you have a CPU-enabled machine, use the following command to install the PaddleOCR:
!python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple
If you happen to get an error when installing, it's because you haven't installed the
pip
command. To install it, use the command,pip3 install --upgrade pip
. Also, since we're installing it on our notebook, we have to include the exclamation mark!
before the installation command.
Let's install PaddleOCR through a quick pip install, using the following command:
pip install paddleocr
It would be best if you cloned the PaddleOCR repo on Github.
!git clone https://github.com/PaddlePaddle/PaddleOCR
This clone is crucial as it'll enable the look-up of specific fonts when we visualize our results in the last step.
Now that we've installed our dependencies, let's import them into our notebook/colab.
from paddleocr import PaddleOCR, draw_ocr
from matplotlib import pyplot as plt
import cv2
import os
paddleocr
is our main OCR dependencymatplotlib
allows us to plot imagescv2
allows us to use the OpenCV libraryos
allows for folder directory navigation
Applying PaddleOCR model to images
Now, let us apply the PaddleOCR to our images.
These are the two images that we'll be applying PaddleOCR on:
Let's set it up using the following command:
ocr_model = PaddleOCR(lang='en')
This command specifies the language of choice. For our case, we set it to English. Since the PaddleOCR is made by engineers at Baidu, it can also recognize Chinese characters. You may select your language as Chinese or any other supported language.
The following command sets up the path to our images on the machine.
img_path = os.path.join('.', 'drug1.jpg')
Let's run the OCR method on the ocr_model
to make some detection. We can store the results in a variable known as result.
result = ocr_model.ocr(img_path)
Running the command above gives the output:
[[[[114.0, 39.0], [265.0, 5.0], [273.0, 39.0], [121.0, 72.0]],
('Generio', 0.9429093)],
[[[422.0, 99.0], [464.0, 89.0], [469.0, 113.0], [428.0, 122.0]],
('1', 0.5031448)],
[[[135.0, 118.0], [457.0, 40.0], [467.0, 80.0], [144.0, 158.0]],
('TO APPOINTME', 0.9501941)],
[[[302.0, 124.0], [433.0, 94.0], [442.0, 132.0], [311.0, 163.0]],
('THEN', 0.9947262)],
[[[146.0, 166.0], [323.0, 122.0], [332.0, 160.0], [155.0, 203.0]],
('ONLY.', 0.9750903)],
[[[154.0, 204.0], [455.0, 127.0], [468.0, 175.0], [167.0, 253.0]],
('TWICEDAILY', 0.99549305)]]
Using the type(result)
command, we find these results stored in a list. We can use standard list indexing to get the results out.
To grab all the texts, let's loop through the result variable, picking the first index that is the text. The following command is going to print out the text only.
for res in result:
print(res[1][0])
Output:
Generio
1
TO APPOINTME
THEN
ONLY.
TWICEDAILY
The few lines of code above allow us to extract the drug labels in drug bottles/capsules quickly.
Please note that we are using
drug1.jpg
in this example. You can play around with the code and usedrug2.jpg
and see the outcome.
Extracting prescription medication labels using PaddleOCR
This step involves visualizing the results. The draw_ocr
method, opencv
, and matplotlib
library help us achieve this.
Let's extract the boxes, scores, and text coordinates into separate variables. We then pass these coordinates into the draw_ocr
method.
This process feels similar to object detection if you've ever performed such a task. The boxes
variable loops through and grabs the first index inside our result
variable, the texts
variable grabs the second, and the scores
grab the third.
boxes = [res[0] for res in result]
texts = [res[1][0] for res in result]
scores = [res[1][1] for res in result]
The next step involves specifying the font path for our draw_ocr
method. This is why I mentioned earlier in the tutorial that you need to clone the PaddleOCR Github repository.
To use the draw_ocr
method, we need a path defined to the specific fonts. Inside the downloaded repo, follow that path shown below, and you'll find different fonts inside a folder. For our case, latin.ttf
. ttf
is the font format we use for True Type Font
.
We do this by issuing the following command:
font_path = os.path.join('PaddleOCR', 'doc', 'fonts', 'latin.ttf')
Let's now load our image using OpenCV. We use OpenCV's imread()
method to read in our image.
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
The command cv2.COLOR_BGR2RGB
reorders the color channels. By default, OpenCV imports images as BGR.
The final step involves visualizing our images and detections.
Using matplotlib, let's first resize our display area.
plt.figure(figsize=(15,15))
We then use the draw_ocr
method to draw annotations on the image.
annotated = draw_ocr(img, boxes, texts, scores, font_path=font_path)
Finally, we use matplotlib's imshow()
method to visualize the image.
plt.imshow(annotated)
Let's take a look at the result. This's the same image we had earlier. We can see that it has extracted the text and drawn bounding boxes around it.
Please find the complete code to my code implementation here.
Wrapping up
One key feature of the PaddleOCR model is that it's swift. We also noted that with a few lines of code, we can extract labels from prescribed medicine. Amazing, right?
In a nutshell, this tutorial showed you how you could use OCR to extract text from drug labels. The sizes and quality of your image will significantly impact your results.
Remotely running this colab project may give you an error since you can't use my image path remotely. You can import an image of your choice into your colab, or you can carry out the task locally to circumvent this error.
Try playing around with the code and see what you find.
Happy coding!
References
Peer Review Contributions by: Collins Ayuya