Machine Learning

Building an Artificial Neural Network with Keras

In this article, you will learn how to build and train an artificial neural network with Keras. We will make a model that will tell us if a customer will churn. That can be very useful in businesses.

If you know the customers that will churn, you can provide these customers with better offers. So you can retain them. We will use machine learning to determine customers that are likely to churn. We have a sample dataset from a bank. We will predict the customers that will stop banking with this bank. Here is the GitHub repo for this project.

Prerequisites

To follow along with this tutorial, you need to have:

Basics of Artificial Neural Network.
Google Colab.
Download the Churn modeling dataset from Kaggle.

Import Libraries
Data preprocessing
Build and visualize the Artificial Neural Network
Training the ANN
Evaluating the model

Import libraries

Most of the libraries we will be using have been pre-installed on Google Colab. So, we import them into our code:

import numpy as np
import pandas as pd
import tensorflow as tf

Let us confirm the version of Tensorflow we are using. No need to import Keras as it runs on top of Tensorflow 2.

print(tf.__version__)

Output

'2.5.0'

Let us load our dataset. If you are running the codes with Google Colab, then upload the dataset first. Click on the folder icon on the left panel.

click on folder icon

Then click on the upload icon.

Upload_dataset

Go to the directory where the dataset is, in your local computer, and select it. Click Open, to upload the dataset to Colab.

Select dataset in local directory

Uploaded dataset

Let's load our dataset and display the first five records:

dataset = pd.read_csv('/content/Churn_Modelling.csv')
dataset.head()

first 5 rows of dataset

Data preprocessing

Not all the features in our dataset are helpful. We do not need the row number, customer id, and customer names. These features will not help us predict if the customer will churn. Hence, we can get rid of them. We use the code below to separate the features and the label.

X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values
print(X)
print(y)

Here are the features and labels obtained after separation:

features of dataset

dataset label

You notice that there are some categorical variables in our dataset. They are in the geography and gender columns. We have to encode these variables. Since there are two unique variables in the Gender column, we label-encode it. Then, we one-hot encode the Geography column.

One-Hot Encoding creates new columns in the dataset. The number of new columns created depends on the number of unique values in the column to be one-hot encoded. These new columns replace the geography column. For instance, 1.0, 0.0, 0.0 represents a customer from France.

Label Encoding the gender column replaces the texts with numbers. 0 represents Female, while 1 represents Male.

# label encode the gender column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])
print(X)

This is the result obtained after label-encoding:

label encode the gender column

# one-encode the geography column
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
print(X)

This is the result obtained after one-hot encoding:

one hot encode the geography column

Next, using the code below, we split our dataset into training and testing set:

# split the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

Finally, we perform feature scaling. It is vital in deep learning as it helps to reduce the training time.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)

These are the results obtained after feature-scaling:

feature scaling

Now that that's done, let's build and visualize our ANN.

Build and visualize the Artificial Neural Network

We build our neural network with the Sequential() class. We first create the input layer with 12 nodes. Twelve is the number of rows in our training set. We then add the hidden layers.

To keep things simple, we use two hidden layers. The initial hidden layer has 12 nodes, while the next layer has 8 nodes. In the hidden layers, we use the relu activation function.

Finally, we add the output layer. We use a single node at the output layer since we have only two categories. We also use the sigmoid activation function at the output layer. It will give us the probability of a customer churning.

# Initializing the ANN
ann = tf.keras.models.Sequential()
# Add the input layer and first hidden layer
ann.add(tf.keras.layers.Dense(units=12, activation='relu', input_shape=X_train[0].shape))
# Add the second hidden layer
ann.add(tf.keras.layers.Dense(units=8, activation='relu'))
# Add the output layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

Now that we have created our model, let's use the code below to visualize it:

from tensorflow.keras.utils import plot_model
plot_model(ann,
           to_file="model.png",
           show_shapes=True,
           show_layer_names=True,
          )

Output

Artificial Neural Network Plot

We can also use the NN-SVG tool to visualize our model:

ANN Visualization using NN-SVG

Training the ANN

In training the ANN, we perform a couple of tasks:

We compile the model with the Adam optimizer.
We use the binary cross-entropy loss.
We train the model for 100 epochs.

ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)

Last 11 epoch of training

Evaluating the model

Now that our model training is completed, we can make predictions on a single customer. Let us find out if a customer with the details below will churn:

Record	Details
Country	Spain
Credit Score	600
Gender	Male
Age	40 years
Tenure	3 years
Balance remaining	$60000
Number of Products owned	2
Own a Credit Card?	Yes
Is an Active Member?	Yes
Estimated Salary	$50000

print(ann.predict(sc.transform([[0, 0, 1, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))

Output

[[0.04270527]]

Remember that after one-hot encoding, 0, 0, 1 represents the geographical location, Spain. It will be in the first three columns of our matrix of features.

We can add a threshold of 0.5. The customer will leave the bank if the predicted probability is above 0.5. In extreme situations, we can increase the threshold. That is if we want our model to predict True only if it is very confident.

print(ann.predict(sc.transform([[0, 0, 1, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

Output

[[False]]

This is great news for the bank! This customer will not churn. Let us assess our model using the test set:

y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
pd.DataFrame(list(zip(y_test, y_pred)), columns=['Actual', 'Predicted'])

Actual Values vs Predicted Values

It looks like our model got most of the predictions right. But, it made a mistake for the second customer in our test set. We can check the accuracy score, and build a confusion matrix.

from sklearn.metrics import confusion_matrix, accuracy_score
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))

Output

[[1506  89]
 [193  212]]
0.859

The accuracy score is 85.9%. Out of 2000 cases, our model predicted 1718 cases correctly. The confusion matrix shows the number of True Positives, False Positives, False Negatives, and True Negatives.

Our model inaptly predicted that 193 customers churn (False Positives), and 89 customers did not churn (False Negatives). But it correctly predicted that 1506 customers churn (True Positives), and 212 customers did not churn (True Negatives).

Conclusion

In this guide, we learned how to build, visualize and train an ANN using Keras. We made a model that shows the customers that will leave a bank.

We got an accuracy of 85.9%. Now you can make an artificial neural network and train on any dataset. There is no definite architecture to use. You can study different architectures. The goal is to see which one gives you a better result. You can start by using the architectures in deep learning research papers.

Peer Review Contributions by: Willies Ogola

Published on: Jul 20, 2021

Updated on: Jul 12, 2024