Machine Learning

Multi-Class Image Classifier using Deep AutoViML

Image classification categorizes input images into their respective labels or categories. Most image classifications usually have only two classes for example cat and dog. In a multi-class, we have three or more classes such as lion, cheetah, leopard, and tiger.  Multi-class image classification categorizes an input image into one of the three or more classes. When doing a multi-class classification, the predicted image belongs to only one class. An image cannot belong to more than one class at the same time.

We will use the Deep AutoViML library in building the model. Deep AutoViML is an Automated Machine Learning (AutoML) library that builds neural networks using TensorFlow and Keras.

In this tutorial, we will use Deep AutoViML to build a model that classifies images of hands playing the rock-paper-scissors game. The images have three classes: rock, paper and scissors.

Prerequisites
Getting started with Deep AutoViML
Deep AutoViML features
Using GPU
Images dataset
Unzipping the dataset
Cleaning the directory
Displaying images
Importing the libraries
Dataset directory
Initializing the neural network
Fitting the neural network
Model type and the layers of the neural network
Conclusion
References

Prerequisites

To follow along with this tutorial, the reader should:

Understand image preprocessing in Python.
Understand convolution neural networks.
Know how to build an image classifier using Keras and TensorFlow.
Run the model in Google Colab notebook.

Getting started with Deep AutoViML

Deep AutoViML automates the process of building an image classification model. Deep AutoViml uses Keras and Tensorflow libraries to build a deep learning model. The models are fast and more accurate.

Deep AutoViML can also automate NLP tasks. In this tutorial, we will be focusing on image classification. Deep AutoViML has the following useful features:

Deep AutoViML features

Deep AutoViML uses Keras preprocessing layers to automate image preprocessing - Image preprocessing is an essential stage in image classification. It converts the images into a format the neural network can understand and use.

The most common image preprocessing activities are as follows:

Image normalization - Image normalization converts the range of an image's pixel intensity values to a user-defined range.
Image resizing - Image resizing is the process of changing the image size. This enables the resized image to fit into the neural network you are building.
Image standardization - Changing the image pixels to meet the required standards so that the image can have uniform heights and widths.
Image scaling - This process converts the image to a user set scale.

These processes are very important, to understand them in detail, read this article for more information. These processes provide an image that is ready for use.

Deep AutoViML uses pre-trained models - Pre-trained models are neural networks models that undergo training using large datasets. The models are then imported to solve similar problems.

In this tutorial, we will use the pre-trained MobileNet convolutional neural network which is used for image classification. The MobileNet is trained using a large image dataset. This model is already imported into Deep AutoViML.

Deep AutoViML performs hyper-parameter tuning - Hyper-parameter tuning is the process of fine-tuning the parameters of a neural network. Finding the best parameters produces an optimized model. Deep AutoViML uses the Optuna to automatically search for the best neural network parameters.
Deep AutoViML allows you to add more custom layers - Using Deep AutoViML, we can add more layers to the pre-trained model. This enables us to have more user-centric models that can meet the user's needs.
Deep AutoViML automatically selects the best model.
Deep AutoViML automatically saves the trained model.

To use this library, we install it using the command below:

!pip install deep_autoviml --upgrade

After the installation process, we can now import the deep_autoviml using the following code:

from deep_autoviml import deep_autoviml as deepauto

Using GPU

GPUs are much faster than CPUs. Since we are dealing with an image classification problem, we initialize the GPU. We will also monitor its performance.

To use Google Colab's GPU, follow the steps below:

Click the Runtime option.

alt text

Click Change runtime type.

alt text

Then select the GPU option and save.

alt text

To manage and monitor this GPU, we will use Nvidia-smi:

!nvidia-smi

Let us now use deep_autoviml and Nvidia-smi to build the multi-class image classification model. First, we download the images dataset.

Images dataset

The image dataset has three classes: rock, paper, and scissors. To get the hand gestures dataset, click here.

We now need to add the dataset to our working directory. To download this dataset into our working directory, we use the command below:

!wget --no-check-certificate \
  https://www.kaggle.com/drgfreeman/rockpaperscissors.zip

This command downloads the dataset and adds the dataset into the working directory. The dataset will be in a zip format.

Unzipping the dataset

To unzip the dataset, we use the command below:

!unzip rockpaperscissors.zip

After unzipping the dataset, let's create a train and test folder. The train folder will have the train set and the test folder the test set.

To create these folders, we use the command below:

!mkdir rockpaperscissors/train
!mkdir rockpaperscissors/test

Let us move the train set to this folder using the code below:

!mv rockpaperscissors/paper rockpaperscissors/train/paper
!mv rockpaperscissors/rock rockpaperscissors/train/rock
!mv rockpaperscissors/scissors rockpaperscissors/train/scissors

Adding the testing images

To add the test set images to the test folder, we use the command below:

!mv rockpaperscissors/rps-cv-images rockpaperscissors/test

Now we clean the directory by removing unnecessary files:

Cleaning the directory

To clean the directory, we use the command below:

!rm rockpaperscissors/README_rpc-cv-images.txt
!rm rockpaperscissors/test/README_rpc-cv-images.txt

This command will remove the two files from our directory.

Displaying images

To display some of the images, we use the IPython.display Python module. To import this module, we use the code below:

from IPython.display import Image, display

We will display some of the images, using the code below:

display(Image('rockpaperscissors/train/scissors/KNNsNhduqSfuqEK5.png'))

The output is shown below:

First image

The image above shows a scissors gesture.

display(Image('rockpaperscissors/test/paper/9zuFLFklb0ibEelE.png'))

Second image

The image above shows a paper gesture.

display(Image('rockpaperscissors/test/rock/JOaPrPIINVvoI9l4.png'))

Third image

The image above shows a rock gesture.

Importing the libraries

We import the important libraries as follows:

import matplotlib.pyplot as plt
import numpy as np

import warnings
warnings.filterwarnings('ignore')

The imported libraries have the following functions:

matplotlib - Matplotlib is a visualization library. We use Matplotlib for diagrams and different graphs.
numpy - Numpy converts the input images into arrays. It performs mathematical operations on arrays.
warnings - It shows warnings that arise when running our model. In our case, we have decided to ignore these warnings.

After installing these libraries, let us specify the dataset directory.

Dataset directory

Adding the dataset directory will enable the model to know the dataset location. To add the directory, we use the code below:

image_dir = '/content/rockpaperscissors/'

Next, let us specify the image size and the number of the image classes:

Image sizes

We specify the height and width of the images. To add the image size, use the code below:

img_height = 224
img_width = 224
img_channels = 3

From the code above, we set the image height to 224 and the image width to 224. This is the default image size that Deep AutoViML expects. Deep AutoViML uses the pre-trained MobileNet internally.

The MobileNet model uses images that are 244 by 244 pixels. img_channels = 3 represents the number of image classes (rock, paper, scissors).

Next, let us initialize the neural network.

Initializing the neural network

We initialize the neural networks using the code below:

keras_model_type =  "image"
keras_options = {"early_stopping": True, 'lr_scheduler': 'rlr', "epochs": 3}
model_options = {'tuner':"optuna", "max_trials":2, 'cat_feat_cross_flag':True,
 'image_directory': image_dir, 'image_height': img_height,
 'image_width':img_width, 'image_channels':img_channels }

From the code snippet above, the neural network has the following parameters:

keras_model_type - Deep AutoViML uses the Keras library model to build neural networks. Since we are building an image classification model, we set the type of model as an image classifier. This will automatically use the MobileNet model to train the neural network.
keras_options - It changes the Keras model options. Keras models have options such as early_stopping, lr_scheduler, and epochs.

Let us explain these models:

early_stopping - Early stopping is a parameter used to avoid model overfitting during training. We set it to True.
lr_scheduler- It is used to determine the rate of the neural network during each epoch/iteration.
epochs - Determines the total number of iterations of the model. We have set the number of epochs to 3.

model_options - It sets the key attributes of the input images. It also sets the hyper-parameter tuning technique.

These techniques are:

tuner - It sets the hyper-parameter tuning technique. We will use Optuna to automatically search for the best neural network parameters.
max_trials - It is the maximum number of combinations tested by Optuna during hyperparameter tuning.
image_directory - It specifies the image directory.
image_height - It specifies the image height.
image_channels - It specifies the number of image classes.

After initializing the neural network, let's fit the neural network onto our dataset.

Fitting the neural network

Fitting the neural network allows the model to learn from the training images. This will enable the model to gain useful insight which it will use to make predictions.

To fit the neural network, we use the code below:

model = deepauto.fit("", "", keras_model_type=keras_model_type,
 project_name='rock_paper_scissors_classifier',
 save_model_flag=False, model_options=model_options,
 keras_options=keras_options, use_my_model='', verbose=0)

We fit the neural network using the deepauto.fit function. In the function we add the project_name as rock_paper_scissors_classifier. The other function values are the parameters set during the initialization process. We also use the save_model_flag to specify if we want to save our model or not.

When the code is executed, it automates the model-building process. It then produces different outputs to show the progress.The first output shows the model training using the rock-paper-scissors dataset. The output also shows the number of GPUs used to speed up the process.

This output is shown in the image below:

Training process

From the image above, we are using 1 physical GPU and 1 logical GPU. The models learn from the images under the given image directory.

The neural network iterates through the training dataset 3 times. It prints the accuracy score after each iteration. The final accuracy score is 0.96875.

The next output displays some of the images:

Displayed images

Deep AutoViML also automatically plots the Model Training vs Validation loss and Model Training vs Validation Accuracy diagrams. These diagrams are shown below:

Diagrams

The diagram on the left shows the Model Training vs Validation loss. From the image, we can see the train_loss and the val_loss reduces over time. This shows that our model improved as the training process continued.

The diagram on the right shows the Model Training vs Validation Accuracy. We can see the train_accuracy and val_accuracy increase over time. This shows that our model improved as the training process continued.

Model type and the layers of the neural network

This output shows the model type and the layers of the neural network. It also shows the model's summary and parameters. The parameters are categorized into trainable and non-trainable parameters.

This output represents the structure of the neural network:

Neural network layers

From the image above, the model type is sequential. Sequential models allow layers to be built on top of each other. It also shows the total model parameters (2,261,827).

The trainable parameters (3,843) are the ones the neural network will train. The non-trainable parameters (2,257,984) are already trained.

We import the non-trainable parameters from the pre-trained MobileNet. The number of non-trainable parameters is more than the trainable parameters. This saves training time.

Finally, Deep AutoViML automatically makes a few predictions. It classifies some of the images in the test set. It prints the actual label and the predicted label of the images.

The actual label shows the actual image class. The predicted label shows the class the model predicts. If the actual label and the predicted label match, the model would have made the right prediction.

This output is shown in the image below:

Model predictions

From the image above, the model has classified 6 images. For all the images, the actual label and the predicted label match. This shows that our model was able to make the right predictions.

We have also achieved multi-class image classification. The model was able to classify an image into one of the three classes.

Conclusion

In this tutorial, we have gone over a multi-class image classifier using Deep AutoViML. We discussed the Deep AutoViML features used to build powerful deep learning models.

Using these features we were able to build a multi-class image classification model. The final model was able to classify images of hands playing the rock-paper-scissors game.

I hope you find this tutorial insightful.

To get the complete Python code in Google Colab, click here.