Spotting Of Passing Objects Using OpenCV In Python
Tracking and spotting moving objects has become crucial in our lives today. Talking of security, cameras help us a great deal to detect and track moving objects. <!--more--> In this tutorial, we will learn how to spot a passing object and return the spotted object in real-time using computer vision.
Table of contents
Pre-requisites
- You should have a basic understanding of Python.
- A brief overview and prior knowledge of computer vision would help. You can learn a little about computer vision here.
- You should have a webcam on your laptop.
With those at hand, we will be able to tackle our project. Let's dive in.
Installation
Install OpenCV
First, we have to install OpenCV
to manipulate and work with the webcam images or videos. It also helps with computer-vision related solutions to process images and live videos.
To learn more about OpenCV, it is recommended to go through this documentation.
In your working environment, open the terminal and paste the command below to have the OpenCV downloaded and installed simultaneously.
pip install OpenCV-Python
This library allows for modules such as cv2
to be installed. Since it is a cross-platform library, it contains a variety of functions to read and manipulate images.
Install NumPy
NumPy is used to express data as multi-dimensional arrays. In our case, we represent the image pixel values as arrays.
You can read more about NumPy in this documentation.
pip install numpy
Install imutils
imutils
library helps to rotate, resize, and skeletonize the images alongside OpenCV.
You can learn more about imutils
here.
pip install imutils
After successful installations, we will now jump into the code area.
Implementation
Import external libraries
As we had discussed earlier, we will import all the above-mentioned libraries to our Python code as shown:
import cv2
import numpy as np
import datetime
import imutils
datetime
library returns the current date and time. In our case, it helps keep track of the live or real-time data.
Initialize variables
We will have to initialize objects that helps us capture and record video data as shown:
rec = cv2.VideoCapture(0)
sto, mapping1 = rec.read()
sto, mapping2 = rec.read()
From the above code:
- The first mapping is to store the initial frame.
- The second mapping is to store the subsequent frames.
- In the function
VideoCapture
, we use0
since it denotes the access for the webcam.
Detect and convert frames to grayscale
A video can be defined as a series of continuous images with the difference in time. So, we will have to initially identify and recognize the frames then have them processed.
Image frames are converted to grayscale to increase the accuracy to detect the key features and eliminate possible misses.
To learn more about what Grayscale is, it is recommended to read this article.
while rec.isOpened():
sub = cv2.absdiff(mapping1, mapping2)
convclr = cv2.cvtColor(sub, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(convclr, (3, 3), 1)
_, thresh = cv2.threshold(blur, 20, 255, cv2.THRESH_BINARY)
enlarged = cv2.dilate(thresh, None, iterations=1)
contours, _ = cv2.findContours(enlarged, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
In the above code:
- We will assign
sub
to the difference between the first and second mapping. - The
cvtColor
method is called to convert thesub
to grayscale on specifyingcv2.COLOR_BGR2GRAY
. - We then call the
GaussianBlur
function to the blur onsub
. - After the above operations, we call the
dilate
function to enlarge thesub
, thereby enhancing its accuracy by removing all the gaps between. - On iterating over it, we get the resulting contours.
Blur the frame
GaussianBlur
from cv2
is responsible for the blurring of the image. It smoothens the pixels and averages out the intensity across pixel height against its width, thus filtering high-intensity noise.
Let's implement it as shown below:
if mapping1 is None:
break
mapping1 = imutils.resize(mapping1, width=700)
convclr1 = cv2.cvtColor(mapping1, cv2.COLOR_BGR2GRAY)
convclr2 = cv2.GaussianBlur(convclr1,(21, 21), 0)
In the above code:
- The blur is performed to remove high-frequency components from the image.
- We find that the first and second frames are already converted to grayscale.
- Using
imutils
we resize the frame to the desired size of700
pixels wide.
Find the difference between delta frames
Since we stored the first frame in mapping1
, we will be able to compute the difference for the later frames using mapping2
as shown:
if mapping1 is None:
mapping1 = convclr2
continue
frameDelta = cv2.absdiff(convclr2, convclr1)
thresh = cv2.threshold(frameDelta, 25, 255, cv2.THRESH_BINARY)[1]
In the above code:
- The absolute differences between the pixel matrices in
mapping1
andmapping2
fetch the absolute value of the pixel intensityframeDelta
. - The
THRESH_BINARY
function helps turn the color of the passing object to white while setting the color of the background to black.
Find the contours
To enlarge the images, we call a dilate
function from cv2
as shown:
thresh = cv2.dilate(thresh, None, iterations=2)
count = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
count = imutils.grab_contours(count)
In the above code:
- We use the
findContours
function to get the contours. - Two arguments are passed to the function to retrieve and approximate, thus saving on memory by removing repeated points and compressing them.
- The
dilate
function accepts two inputs, one of which is our input image, and the other is known as the structuring element or kernel, which determines the nature of the operation. - Dilation of the image helps increase the size of the object.
Format the displayable text and time layout
To be able to write text on the footage, you will have to call a cv2
function putText
.
It defines the font, color, size, depth, and the family of the text. To display the current time or real-time data we use datetime
module as shown:
for count in contours:
(x, y, w, h)=cv2.boundingRect(count)
if cv2.contourArea(count)<700:
continue
cv2.rectangle(mapping1, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.putText(mapping1, "REPORT: {}".format('PASSING OBJECT DETECTED'), (5, 30), cv2.FONT_HERSHEY_DUPLEX, 1, (255, 0, 0), 4)
cv2.putText(mapping1, datetime.datetime.now().strftime("%A %d %B %Y %I:%M:%S%p"), (10, mapping1.shape[0]-10),cv2.FONT_HERSHEY_DUPLEX, 0.9, (0, 255, 0), 3)
Display the image and footage
The cv2
function imshow
allows us to display and return the called images or video feeds that was stored in the earlier instantiated values as shown:
cv2.imshow("FOOTAGE", mapping1)
cv2.imshow("MARGIN", thresh)
cv2.imshow("DIFFERENCE OF DELTA FRAMES", frameDelta)
mapping1=mapping2
sto, mapping2=rec.read()
Wrapping it up
We will need to assign a key to record the keypress and terminate our program. On terminating the program, we clean the webcam and release any resources in use.
Lastly, we destroy all the windows constructed by OpenCV
.
if cv2.waitKey(50)==50:
break
cv2.destroyAllWindows()
Results
Below are the expected output and outcome for our program.
Output for THRESH
:
Output for FRAMEDELTA
:
Output for FOOTAGE
:
Conclusion
In this tutorial, we have learned several concepts that help with spotting passing objects using OpenCV and other important related tools.
We have gone over installation of the required libraries and the usage of the cv2
module. We have also learned how to compute the frame deltas and format the text layout to be displayed.
You can find the full source here.
For further reading, we suggest you read this article.
Happy coding!
Peer Review Contributions by: Srishilesh P S