Building an Artificial Neural Network with Keras
In this article, you will learn how to build and train an artificial neural network with Keras. We will make a model that will tell us if a customer will churn. That can be very useful in businesses.
If you know the customers that will churn, you can provide these customers with better offers. So you can retain them. We will use machine learning to determine customers that are likely to churn. We have a sample dataset from a bank. We will predict the customers that will stop banking with this bank. Here is the GitHub repo for this project.
Prerequisites
To follow along with this tutorial, you need to have:
- Basics of Artificial Neural Network.
- Google Colab.
- Download the Churn modeling dataset from Kaggle.
Table of contents
- Import Libraries
- Data preprocessing
- Build and visualize the Artificial Neural Network
- Training the ANN
- Evaluating the model
Import libraries
Most of the libraries we will be using have been pre-installed on Google Colab. So, we import them into our code:
import numpy as np
import pandas as pd
import tensorflow as tf
Let us confirm the version of Tensorflow we are using. No need to import Keras as it runs on top of Tensorflow 2.
print(tf.__version__)
Output
'2.5.0'
Let us load our dataset. If you are running the codes with Google Colab, then upload the dataset first. Click on the folder icon on the left panel.
Then click on the upload icon.
Go to the directory where the dataset is, in your local computer, and select it. Click Open, to upload the dataset to Colab.
Let's load our dataset and display the first five records:
dataset = pd.read_csv('/content/Churn_Modelling.csv')
dataset.head()
Data preprocessing
Not all the features in our dataset are helpful. We do not need the row number, customer id, and customer names. These features will not help us predict if the customer will churn. Hence, we can get rid of them. We use the code below to separate the features and the label.
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values
print(X)
print(y)
Here are the features and labels obtained after separation:
You notice that there are some categorical variables in our dataset. They are in the geography and gender columns. We have to encode these variables. Since there are two unique variables in the Gender
column, we label-encode it. Then, we one-hot encode the Geography
column.
One-Hot Encoding creates new columns in the dataset. The number of new columns created depends on the number of unique values in the column to be one-hot encoded. These new columns replace the geography column. For instance, 1.0, 0.0, 0.0
represents a customer from France
.
Label Encoding the gender
column replaces the texts with numbers. 0
represents Female
, while 1
represents Male
.
# label encode the gender column
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])
print(X)
This is the result obtained after label-encoding:
# one-encode the geography column
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
print(X)
This is the result obtained after one-hot encoding:
Next, using the code below, we split our dataset into training and testing set:
# split the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
Finally, we perform feature scaling. It is vital in deep learning as it helps to reduce the training time.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
These are the results obtained after feature-scaling:
Now that that's done, let's build and visualize our ANN.
Build and visualize the Artificial Neural Network
We build our neural network with the Sequential()
class. We first create the input layer with 12 nodes. Twelve is the number of rows in our training set. We then add the hidden layers.
To keep things simple, we use two hidden layers. The initial hidden layer has 12 nodes, while the next layer has 8 nodes. In the hidden layers, we use the relu activation function.
Finally, we add the output layer. We use a single node at the output layer since we have only two categories. We also use the sigmoid activation function at the output layer. It will give us the probability of a customer churning.
# Initializing the ANN
ann = tf.keras.models.Sequential()
# Add the input layer and first hidden layer
ann.add(tf.keras.layers.Dense(units=12, activation='relu', input_shape=X_train[0].shape))
# Add the second hidden layer
ann.add(tf.keras.layers.Dense(units=8, activation='relu'))
# Add the output layer
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
Now that we have created our model, let's use the code below to visualize it:
from tensorflow.keras.utils import plot_model
plot_model(ann,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
)
Output
We can also use the NN-SVG tool to visualize our model:
Training the ANN
In training the ANN, we perform a couple of tasks:
- We compile the model with the Adam optimizer.
- We use the binary cross-entropy loss.
- We train the model for 100 epochs.
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)
Evaluating the model
Now that our model training is completed, we can make predictions on a single customer. Let us find out if a customer with the details below will churn:
Record | Details |
---|---|
Country | Spain |
Credit Score | 600 |
Gender | Male |
Age | 40 years |
Tenure | 3 years |
Balance remaining | $60000 |
Number of Products owned | 2 |
Own a Credit Card? | Yes |
Is an Active Member? | Yes |
Estimated Salary | $50000 |
print(ann.predict(sc.transform([[0, 0, 1, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])))
Output
[[0.04270527]]
Remember that after one-hot encoding, 0, 0, 1
represents the geographical location, Spain
. It will be in the first three columns of our matrix of features.
We can add a threshold of 0.5. The customer will leave the bank if the predicted probability is above 0.5. In extreme situations, we can increase the threshold. That is if we want our model to predict True
only if it is very confident.
print(ann.predict(sc.transform([[0, 0, 1, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)
Output
[[False]]
This is great news for the bank! This customer will not churn. Let us assess our model using the test set:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
pd.DataFrame(list(zip(y_test, y_pred)), columns=['Actual', 'Predicted'])
It looks like our model got most of the predictions right. But, it made a mistake for the second customer in our test set. We can check the accuracy score, and build a confusion matrix.
from sklearn.metrics import confusion_matrix, accuracy_score
print(confusion_matrix(y_test, y_pred))
print(accuracy_score(y_test, y_pred))
Output
[[1506 89]
[193 212]]
0.859
The accuracy score is 85.9%. Out of 2000 cases, our model predicted 1718 cases correctly. The confusion matrix shows the number of True Positives, False Positives, False Negatives, and True Negatives.
Our model inaptly predicted that 193 customers churn (False Positives), and 89 customers did not churn (False Negatives). But it correctly predicted that 1506 customers churn (True Positives), and 212 customers did not churn (True Negatives).
Conclusion
In this guide, we learned how to build, visualize and train an ANN using Keras. We made a model that shows the customers that will leave a bank.
We got an accuracy of 85.9%. Now you can make an artificial neural network and train on any dataset. There is no definite architecture to use. You can study different architectures. The goal is to see which one gives you a better result. You can start by using the architectures in deep learning research papers.
Peer Review Contributions by: Willies Ogola