Machine Learning

Implementing Holistic Tracking using Python

This tutorial will discuss full-body pose estimation using MediaPipe holistic. The model will detect all the facial landmarks within the face, hands, and poses from our body.  The MediaPipe library provides a wide range of ML solution models. We will use MediaPipe Holistic model for this tutorial.

The MediaPipe holistic model comprises of three different models:

MediaPipe Face Mesh detects the facial landmarks on the face, MediaPipe Hands detects all the joints within our hands, and the MediaPipe Pose model detects all the poses within the body.

Prerequisites

To follow through with this tutorial, you need to:

Be familiar with machine learning modeling.
Be familiar with the individual face landmark and hand detection models.
Use either Google Colab or Jupyter Notebook.

For this tutorial, we will be using Google Colab.

Goal
Introduction
Installing and importing dependencies
Setting up MediaPipe
Loading up an image using OpenCV
Detecting landmarks
Wrapping up

Goal

Setting up the MediaPipe library.
Detect poses, facial landmarks, and dual hand poses.
Visualize detections on the screen.

Installing and importing dependencies

There are two key dependencies that we will need for this tutorial:

MediaPipe.
OpenCV.

MediaPipe is used to access the model, while OpenCV is used to access the webcam or still images for detection.

Let's install them.

!pip install mediapipe opencv-python

Next, we need to import them into our notebook.

import mediapipe as mp
import cv2

Setting up MediaPipe

We begin by importing the MediaPipe drawing utility. It will help us draw the detections from the holistic model.

mp_drawing = mp.solutions.drawing_utils

Next, import the holistic model from MediaPipe. Remember, the MediaPipe library consists of many ML solutions. To check these models using code, type in mp.solutions. on a new cell, and you'll be able to see the available models within the library.

We need to import the holistic model.

mp_holistic = mp.solutions.holistic

Loading up an image using OpenCV

from google.colab.patches import cv2_imshow

image = cv2.imread("https://sparkling-desk-070a6c243e.media.strapiapp.com/2058_workout_53424afe89.jpg")
cv2_imshow(image)

After running the command, you should see the following:

Loaded image

Image source: Unsplash

If you want to use the same image for reproducibility, you can find it here.

The next step involves taking the loaded image and performing detections on it.

Detecting landmarks

We begin by initializing the holistic model by using the with segment.

with mp_holistic.Holistic(
    static_image_mode=True, model_complexity=2, enable_segmentation=True, refine_face_landmarks=True) as holistic:

The static_image_mode is set to True to detect still images. When set to False, it detects the input image as a video stream.

The enable_segmentation feature is set to True to allow for the segmentation mask to be generated. The refine_face_landmarks is set to True to allow the detected areas around the lips and eyes to be refined further.

Using the imread() method, we load the https://sparkling-desk-070a6c243e.media.strapiapp.com/2058_workout_53424afe89.jpg image.

image = cv2.imread("https://sparkling-desk-070a6c243e.media.strapiapp.com/2058_workout_53424afe89.jpg")

The next step involves recoloring our image. We use the cvtColor function for this task.

When we use OpenCV, we get the image format in BGR, but when we pass the image to the holistic model, we want the image to be in RGB. It is the only image format accepted in MediaPipe. Hence, the conversion.

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_recolored = holistic.process(image)

Let's begin by drawing the facial landmarks.

 mp_drawing.draw_landmarks(image, image_recolored.face_landmarks, mp_holistic.FACE_CONNECTIONS)

Let's apply the same thing for the hands and pose landmarks.

For the pose landmarks, write the following code:

mp_drawing.draw_landmarks(image, image_recolored.pose_landmarks, mp_holistic.POSE_CONNECTIONS)

For the left-hand landmark:

mp_drawing.draw_landmarks(image, image_recolored.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)

For the right-hand landmark:

mp_drawing.draw_landmarks(image, image_recolored.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)

Visualizing the detections

Use OpenCV's imshow() method to visualize our connections.

cv2_imshow(image)

Output:

Output image

We have successfully implemented a holistic model using Python. You can take this experiment a bit further and try using the model on real-time video data using your computer webcam.

Additionally, you can change the color, thickness, and circle radius using the landmark_drawing_spec and the connection_drawing_spec features.

Please find the complete code for this tutorial here.

Wrapping up

This tutorial has demonstrated how to implement a full-body pose estimation using MediaPipe holistic. This model can be used to detect different forms of body language, such as if one is happy, sad, or angry.

In addition, you could use it to build a touchless gesture control, or a workout counter, i.e., taking count of how many press-ups you do or counting the number of biceps you've done. The use cases are endless.

Peer Review Contributions by: Willies Ogola

Published on: Jan 12, 2022

Updated on: Jul 15, 2024