arrow left
Back to Developer Education

Image Classifier using Transfer Learning with Tensorflow

Image Classifier using Transfer Learning with Tensorflow

Transfer learning is a technique that trains a neural network on one problem and then applies the trained neural network to a different but related problem. It focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. Transfer learning decreases the training time and produces a model that performs well. <!--more-->

For example, knowledge gained while learning to recognize lemons could apply when trying to recognize oranges. Lemons and oranges are different but related problems. The neural network is fine-tuned to meet the user's needs rather than being trained from scratch.

In this tutorial, we will build a model that classifies images of hands playing rock, paper, scissor games. We will download a pre-trained MobileNet-v2 convolutional neural network from the TensorFlow hub. We will then fine-tune it to classify images of hands playing rock, paper, scissor games.

Table of contents

Prerequisites

To follow along with this tutorial, a reader should:

Importing important libraries

For this tutorial, import the following libraries.

import matplotlib.pylab as plt
import tensorflow as tf
import tensorflow_hub as hub
import os
import numpy as np
import tensorflow_datasets as tfds

The libraries are important in building our transfer learning model. The functions of each of these libraries are as follows:

matplotlib.pylab - It is a visualization library. We use Matplotlib to plot line graphs, figures, and diagrams.

tensorflow - It is an open-source library for machine learning and artificial intelligence. We use it to create the input, dropout, and dense layers for our image classification model.

tensorflow_hub - It is a TensorFlow repository that contains a collection of pre-trained models.

os - It enables us to interact with the operating system. The OS module in Python provides functions for creating and removing a directory, fetching its contents, changing and identifying the current directory.

numpy - It will convert the image dataset into arrays. It also enables us to perform mathematical operations on arrays.

tensorflow_datasets - It is a TensorFlow repository that is made up of a collection of ready-to-use datasets.

Downloading the images dataset

We will download the rock, paper, scissors image dataset from tensorflow_datasets using the following code:

datasets, info = tfds.load(name='rock_paper_scissors', with_info=True, as_supervised=True, split=['train','test'])

We have downloaded the dataset and saved it into train and test sets.

To check the information available in our dataset, run this command:

info

The output is shown below:

Dataset information

From the image above, we have a total of 2892 images. The image size is 300 by 300 pixels and we have 3 classes. Let's display some of the images.

Displaying images

To show the images, we will specify the image set to be displayed. We will display the train set using the following code:

train, info_train = tfds.load(name='rock_paper_scissors', with_info=True, split='test')
tfds.show_examples(info_train,train)

The images are shown below:

Displaying images

Image shuffling

We shuffle to reduce model bias. Shuffling enables the model to learn rather than memorize the images.

dataset=datasets[0].concatenate(datasets[1])
dataset=dataset.shuffle(3000)

In the code above, we first concatenate the two image sets(train and test). Then, randomly shuffle the 3000 images.

Splitting the dataset into three sets

After shuffling the dataset, split the dataset into three sets. Train set, validation set, and test set.

  • Train set: it is used to train the model. The model learns from this set.

  • Validation set: it is used to fine-tune the model hyper-parameters so that we can have an optimized model.

  • Test set: it is used to assess the final model after training. It checks if the model can make accurate predictions.

We split the dataset using the following code:

rsp_val=dataset.take(600)
rsp_test_temp=dataset.skip(600)
rsp_test=rsp_test_temp.take(400)
rsp_train=rsp_test_temp.skip(400)

From the code above, we have used 600 images as the validation set, 400 images as the test set, and 400 images as a train set.

Image normalization and resizing

Image normalization is the process of changing the range of an image's pixel intensity values to a predefined range. Often, the predefined range is usually [0, 1], or [-1, 1]. In this tutorial we want our pixel range to be [0, 1]. For a detailed understanding on image normalization, click here.

Image resizing is the process of changing the image size. This enables the resized image to fit into the neural network you are building. To perform this process, use the following function.

def scale(image, label):
  image = tf.cast(image, tf.float32)
  image /= 255.0
  return tf.image.resize(image,[224,224]), tf.one_hot(label, 3)

From the code above, we performed image normalization by dividing the image by 255. It will change the pixel range to 0, 1. The code also resized our image to 224 by 224 using the tf.image.resize method. It is the same size as the images from the pre-trained MobileNet-v2 convolutional neural network.

Finally, the code performs one-hot encoding using the tf.one_hot method. One hot encoding converts the categorical variables (rock, paper, scissors), into integer values (0, 1, 2). The neural network understands integer values (numeric values). After this process, we need to add a batch size for each set.

Adding batch size

Batch size is the number of data samples used in each set during an iteration (epoch). We will set the batch size to 64. This is done using the following function.

def get_dataset(batch_size=64):
  train_dataset_scaled = rsp_train.map(scale).shuffle(1900).batch(batch_size)
  test_dataset_scaled =  rsp_test.map(scale).batch(batch_size)
  val_dataset_scaled =  rsp_val.map(scale).batch(batch_size)
 return train_dataset_scaled, test_dataset_scaled, val_dataset_scale

From the code above, each set (train, validation, and test) will have 64 images during an iteration (epoch).

We call the get_dataset function to be applied to the dataset.

train_dataset, test_dataset, val_dataset = get_dataset()

Finally, cache the train and Val set so that the model can use.

Caching dataset

To cache the dataset, use this code:

train_dataset.cache()
val_dataset.cache()

This dataset is now ready for use. The next step is to download the MobileNet-v2 convolutional neural network.

Downloading the MobileNet-v2 convolutional neural network

To download this neural network run this command:

feature_extractor = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"

This model is already pre-trained using different images. MobileNet-v2 follows the convolutional neural network architecture. It is made up of a feature extractor layer (collection of convolutional and pooling layers) and fully connected layers. For further understanding of the convolutional neural network architecture, read this article.

We will apply this model to classify images of hands playing rock, paper, scissor games. To use this model, we extract the feature extractor layer from the MobileNet-v2 model. We then use the feature extractor layer as the input layer when building the model.

Extract the feature extractor layer from the MobileNet-v2 model

The feature extractor layer of the MobileNet-v2 model is made up of a collection of stacked convolutional and pooling layers. This layer is very important and is used to extract the important features from the input image. For further understanding of how the convolutional and pooling layers work, read this article.

We extract the layer using the following code:

feature_extractor_layer = hub.KerasLayer(feature_extractor, input_shape=(224,224,3))

This layer is already trained. To ensure that it will not be trained when we build our neural network, run the following code:

feature_extractor_layer.trainable = False

Initliazing the neural network

We initialize our neural network as follows:

model = tf.keras.Sequential([
  feature_extractor_layer,
  tf.keras.layers.Dropout(0.5),
  tf.keras.layers.Dense(3,activation='softmax')
])

From the code above, we are building a sequential model that allows layers to be built on top of each other. We have used the feature_extractor_layer as the input for the neural network. We then add a Dropout layer to prevent model overfitting.

Finally, add the Dense layer, which is the output layer for the neural network. It has 3 neurons because our model has three classes. We used a softmax because we have more than two classes. For further understanding of how the softmax activation function works, read this article.

To check the summary of this model, use this code:

model.summary()

Model summary

The image shows the model type (Sequential) and the initialized layers. It also shows the total model parameters (2,261,827). Some parameters are trainable while others are non-trainable. The trainable parameters (3,843) are the ones the neural network will train. The non-trainable parameters (2,257,984) are from the feature_extractor_layer and they are already trained. The number of non-trainable parameters is more as compared to the trainable parameters. This will save the training time.

Model compling

In model compiling, we determine the metrics, the optimizer, and the loss function to be used by the neural network.

Metrics

It is used to calculate the accuracy score of the neural network. This determines the probability of model-making accurate predictions.

Optimizer

It is used to enhance the model performance as it learns from the train set. The optimize troubleshoots the model during training and removes errors. The most common optimizer is the Adam optimizer which we will use for this neural network.

Loss function

It is used to determine the total model error. We will use the CategoricalCrossentropy because our dataset is made up of three categories (rock, paper, scissors). To compile this model, use this code:

model.compile(
 optimizer=tf.keras.optimizers.Adam(),
 loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
 metrics=['acc'])

The next step is to fit our compiled model into the train_dataset and the val_dataset.

Model fitting

During model fitting, the model will learn from the train_dataset. val_dataset will be used to fine-tune the model parameters so that we have an optimized model.

history = model.fit_generator(train_dataset, epochs=2, validation_data=val_dataset)

We also set epochs=2. This number of times the model will iterate through the train_dataset and val_dataset during training. When we run this code, the training process will start and produce the following output.

Model training

From the image above, the model accuracy score after the first epoch is 0.8333. This represents 83.33%. After the second iteration, the accuracy score increased to 0.9722, this is 97.22. The model improves and increases the chances of making the right classifications.

Accuracy score using the test set

The test accuracy score is used to assess the final model after training. It checks the model performance using the test dataset.

result=model.evaluate(test_dataset)

The accuracy score is as shown below:

7/7 [==============================] - 1s 88ms/step - loss: 0.6086 - acc: 0.9850

The accuracy score is 98,50%. This shows our model performs well using both the train and test datasets. The next step is to use the model to make predictions.

Making predictions

We use 10 images from the test dataset to make predictions. The for loop will be used to select the 10 images from the test dataset.

for test_sample in rsp_test.take(10):  
  image, label = test_sample[0], test_sample[1]
  image_scaled, label_arr= scale(test_sample[0], test_sample[1])
  image_scaled = np.expand_dims(image_scaled, axis=0)   

After selecting the images, let's print the prediction results.

Printing prediction results

We will print the actual label and the predicted label. The actual label represents the actual image category/class in the test dataset. The predicted label is the category/class the model predicts.

img = tf.keras.preprocessing.image.img_to_array(image)                    
  pred=model.predict(image_scaled)
 print(pred)
  plt.figure()
  plt.imshow(image)
  plt.show()
 print("Actual Label: %s" % info.features["label"].names[label.numpy()])
 print("Predicted Label: %s" % info.features["label"].names[np.argmax(pred)])

We have the tf.keras.preprocessing.image.img_to_array method to convert the images into an array. We use the predict method to make the predictions. The predictions results are shown below:

Prediction results

From the image above, the model was able to make the right predictions. The Actual Label is the same as the Predicted Label.

Let's look at another prediction result.

Another prediction result

For this result, the model was able to make the right predictions. The Actual Label is the same as the Predicted Label. This shows our image classifier model was well trained. Let's save this trained model.

Saving the model

To save the model, use this code:

model.save('./models/', save_format='tf')

This code will save the model and produce the following output.

Saving model

The output above shows the directory that our model is saved. We can load this model and use it in the future to make predictions.

Conclusion

In this tutorial, we have learned how to build an image classifier using transfer learning. We downloaded the MobileNet-v2 convolutional neural network from the TensorFlow hub. The downloaded model was used to build the model that classifies images of hands playing rock, paper, scissor games.

Finally, we tested the model and it can make accurate predictions. Using this tutorial, a reader should be able to come up with this model. The model we have built in this tutorial is found here.

References


Peer Review Contributions by: Collins Ayuya

Published on: Feb 11, 2022
Updated on: Jul 15, 2024
CTA

Start your journey with Cloudzilla

With Cloudzilla, apps freely roam across a global cloud with unbeatable simplicity and cost efficiency