Secured Deep Learning in Remote Devices

In my previous article, we understood the basics of differential privacy. In this article, we will cover how differential privacy can be applied as Federated Learning that can be deployed in remote devices.

We'll be building a simple deep learning model to demonstrate the working of federated learning. As a prerequisite, you must have an intermediate level of understanding of Python and Deep Learning with the PyTorch library.

Introduction
What is Federated Learning?
How does Federated Learning work?
Installation
Implementation
Conclusion
Further Reading

Introduction

What is Federated Learning?

In Deep Learning, a problem of privacy arises with the centralization of the data used in training and development. The nature of data is for it to remain private, accessible only to the end-users, and not even to the organization that is providing the service. But in today's day and age, we are unsure if our privacy is ever at stake.

Any end-user device using deep learning sends the data to the cloud, the predictions/classifications are made, and it returns the results to the end-users. There is no guarantee that our data is secure. That’s where federated learning (Distributed deep learning), comes into the picture, to preserve privacy of the data.

By making the deep learning model distributed, we can solve the issue of privacy by running several independent deep learning models locally on each of the end-devices, and updating only their aggregated weights to the central deep learning model. This is federated learning in a nutshell.

For example, Google Assistant uses federated learning, when the deep learning model in our keyboard tries to predict the next word, by sending only the final aggregated model to the cloud. So, without uploading the details of any user to the cloud, we get the aggregated results based on local model training.

Workflow of federated learning

Image source

How does Federated Learning work?

Let's see an abstract overview of the working of federated learning.

The Server in the cloud gets initialized with a model/pre-trained model.
The Server sends a copy of the latest aggregated model to the request end-users’ device.
The local model gets trained locally, computes an update, and is sent back to the Global model.
The Server receives updates to the weights and averages them out by a weighting factor for each update in the training set from local.
Steps 1 - 4 are repeated for each request by the client devices.

This concept of Distributed deep learning has become very popular since 2017, after a blog post by Google AI. It has also been by Applethat they have been using it for Siri.

Having a better understanding of federated learning, let’s learn more about it, by implementing them.

Dataset description

In this tutorial, we are going to use the Boston housing dataset to predict the price of housing in Boston. The prediction is done based on various kinds of housing properties.

Installation

It's highly recommended to use Google Colab to get started right away. If you wish to run the below codes in your local system, download Anaconda by referring to the Anaconda documentation.

The libraries to be installed in Anaconda are:

Having installed all the above-mentioned libraries, it's time to get started with the implementation.

Importing libraries

If you are unsure of why these libraries are imported, you will understand them as you implement them further.

  import pickle
  import torch
  import torch.nn as nn
  import torch.nn.functional as F
  import torch.optim as optim
  from torch.utils.data import TensorDataset, DataLoader
  import time
  import copy
  import numpy as np
  import syft as sy
  from syft.frameworks.torch.fl import utils
  from syft.workers.websocket_client import WebsocketClientWorker

Parameters initialization

We set the parameters for the deep learning model, with the number of epochs as 100, learning rate as 0.001, and a batch size as 8 for every epoch. We also manually seed the random number generator.

  class Parser:
    def __init__(self): # Constructor for initializing the parameters
      self.epochs = 100 # Set Number of epochs to 100
      self.lr = 0.001 # Set Learning rate to 0.001
      self.test_batch_size = 8 # Set Batch size of Test dataset to 8
      self.batch_size = 8 # Set Batch size of Train dataset to 8
      self.log_interval = 10 # Set the time between data samples are taken
      self.seed = 1 # Set a value for random number generator

    args = Parser() # Call the class, to initialize the parameters
    torch.manual_seed(args.seed) # Set the seed for random number generator to a fixed value

Loading the dataset

Pickling is the process whereby a Python object hierarchy is converted into a byte stream. Download this pickle file for the Boston Housing dataset.

This pickle file contains binary data for training the deep learning model.

On adding it to the path, we must open the file, and split both the training files and testing files, and convert them to Torch tensors for easier computations and compatibility with other PyTorch libraries.

A Torch tensor is a multi-dimensional matrix containing elements of a single data type. It's used as a data structure which helps make computation easier.

  with open('./boston_housing.pickle','rb') as f:
    ((x, y), (x_test, y_test)) = pickle.load(f) # Load the file, and extract train and test files

    x = torch.from_numpy(x).float() # Convert the train dataset numpy arrays to Torch tensors
    y = torch.from_numpy(y).float()
    x_test = torch.from_numpy(x_test).float() # Convert the test dataset numpy arrays to Torch tensors
    y_test = torch.from_numpy(y_test).float()

Neural network architecture

We create a very simple neural network architecture consisting of 4 fully connected layers, with ReLU as activation functions used after each layer.

To understand more about Neural networks, read this article before further implementation.

ReLU is an activation function that converts the values below zero to zero, and the value remains the same if it is above zero.

This activation is highly preferred since, it doesn't activate all the neurons at the same time, during backpropagation, the weights are not updated.

  class Net(nn.Module): # Create a class containing Neural network architecture
    def __init__(self): # Constructor to initialize the layers
      super(Net, self).__init__() # Call the parent class, to inherit all attributes
      self.fc1 = nn.Linear(13, 32) # Fully connected layer 1, of 13 input nodes and 32 output nodes
      self.fc2 = nn.Linear(32, 24) # Fully connected layer 2, of 32 input nodes and 24 output nodes
      self.fc4 = nn.Linear(24, 16) # Fully connected layer 3, of 24 input nodes and 16 output nodes
      self.fc3 = nn.Linear(16, 1) # Fully connected layer 4, of 16 input nodes and 1 output nodes

    def forward(self, x): # Method for Forward propagation
      x = x.view(-1, 13) # Pass the transpose of the matrix of size 13 to FC1
      x = F.relu(self.fc1(x)) # Activate the output of FC1
      x = F.relu(self.fc2(x)) # Activate the output of FC2
      x = F.relu(self.fc3(x)) # Activate the output of FC3
      x = self.fc4(x) # The output of FC4 is returned
      return x

Here, nn.Linear() creates a simple linear neural network layer of the specified input and output dimensions. Similarly, F.relu() accepts the fully-connected layer as an input, and returns the activated value.

Create workers for remote devices

To manage local end devices, we must bind the Torch tensors with the end-users using sy.TorchHook(torch). Since we aren't going to deploy them live on actual devices, we will assume virtual devices on different WebSocket ports.

Virtual workers are entities present on our local machine. They are used to model the behavior of actual workers. Then, we create 2 different workers for the demonstration.

  hook = sy.TorchHook(torch) # Bind the tensor with local workers
  end_device1 = sy.VirtualWorker(hook, id="device1") # 1st virtual entity
  end_device2 = sy.VirtualWorker(hook, id="device2") # 2nd virtual entity
  compute_nodes = [end_device1, end_device2] # List of workers

Distributing the training dataset to each worker

In this snippet, we separate the data and target values into two different lists. Then, we map the corresponding data and target values in the remote_dataset list for the respective iterated index.

  remote_dataset = (list(), list()) # Declare a tuple of lists
  train_distributed_dataset = [] # Declare a new list
  for batch_idx, (data,target) in enumerate(train_loader): # Load the data and target from the train dataset
    data = data.send(compute_nodes[batch_idx % len(compute_nodes)]) # Separate the independent values from the train dataset
    target = target.send(compute_nodes[batch_idx % len(compute_nodes)]) # Separate the target values from the train dataset
    remote_dataset[batch_idx % len(compute_nodes)].append((data, target))

Here, batch_idx % len(compute_nodes) helps us index the remote_dataset. For our example, the index is 0 and 1.

Initializing neural networks for each remote device

We instantiate both the devices with separate neural network models. We also initialize optimizers for each of the neural networks.

Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate to reduce the losses.

Here, we use the Stochastic Gradient Descent (SGD) optimizer. In short, SGD helps us reduce the loss faster, which happens batch-wise. More about SGD can be read this article.

  device1_model = Net() # Initialize neural network for Device1
  device2_model = Net() # Initialize neural network for Device2

  device1_optimizer = optim.SGD(device1_model.parameters(), lr=args.lr) # Initialize SGD optimizer for Device1
  device2_optimizer = optim.SGD(device2_model.parameters(), lr=args.lr) # Initialize SGD optimizer for Device2

  models = [device1_model, device2_model] # Make a list of models
  optimizers = [device1_optimizer, device2_optimizer] # Make list of optimizers

  model = Net()

Let's print out the initialized weights for both the models, to check if both the models get updated after federated learning aggregation. Here, we print out the weights of the last fully-connected layer fc3.

device1_model.fc3.bias

Output:

Out[1]:
Parameter containing:
tensor([-0.0842], requires_grad=True)

device2_model.fc3.bias

Output:

Out[2]:
Parameter containing:
tensor([-0.0982], requires_grad=True)

We see that device1 has a bias of -0.0842, and device2 has a bias of -0.0982.

Function for model training

On initializing all the models, we write functions to train the model and update the weights and losses. In update(), we predict the values based on input, calculate the losses, and backpropagate to improve the model. Here, for loss, we're using Mean Squared Error (MSE) loss function. In MSE, we find the mean squared difference between the predicted and expected value.

In train(), we iterate through each row, and update the weights and losses for each data, and return the aggregated values.

  def update(data, target, model, optimizer):
    model.send(data.location)
    optimizer.zero_grad() # Reset the optimizer
    prediction = model(data) # Make predictions for the input data
    loss = F.mse_loss(prediction.view(-1), target) # Calculate Mean Squared Error loss
    loss.backward() # Backpropagate the values for training better
    optimizer.step() # Step-up the optimizer for next iteration
    return model

  def train(): # Function for training the model
    for data_index in range(len(remote_dataset[0])-1): # For each row
      for remote_index in range(len(compute_nodes)): # For each batch, within the data
        data, target = remote_dataset[remote_index][data_index] # Extract the corresponding data and its target
      models[remote_index] = update(data, target, models [remote_index], optimizers[remote_index]) # Update the weights and losses using optimizer

    for model in models: # Iterate through each model
      model.get() # Retrieve the parameters for the latest model

    return utils.federated_avg({"device1": models[0],"device2": models[1]}) # Return the aggregated weights and losses of each device

Function for testing the model

This function helps us test the existing model, based on the test dataset, and returns the average loss for each data point.

  def test(federated_model):
    federated_model.eval() # Sets the model to validation
    test_loss = 0 # Initialize test loss to zero
    for data, target in test_loader: # Iterate through each test data
      output = federated_model(data) # Initiliaze the model for particular device
      test_loss += F.mse_loss(output.view(-1), target, reduction='sum').item() # Compute the MSE loss
      prediction = output.data.max(1, keepdim=True)[1]
      test_loss /= len(test_loader.dataset)
      print('Test set: Average loss: {:.4f}'.format(test_loss)) # Return the average loss

Updating the model in each remote device

For demonstration, we train and compute the predictions for each of the two devices. We print out the epoch number for training, and the time is taken to communicate with each end-device.

  for epoch in range(args.epochs):
    start_time = time.time()
    print(f"Epoch Number {epoch + 1}")
    federated_model = train()
    model = federated_model
    test(federated_model)
    total_time = time.time() - start_time
    print('Communication time over the network', round(total_time, 2), 's\n')

Output:

Out[3]:
Epoch Number 1
Test set: Average loss: 615.8278
Communication time over the network 0.09 s
Epoch Number 2
Test set: Average loss: 613.6289
Communication time over the network 0.07 s
Epoch Number 3
Test set: Average loss: 610.8525
Communication time over the network 0.08 s
......
Epoch Number 98
Test set: Average loss: 40.4832
Communication time over the network 0.07 s
Epoch Number 99
Test set: Average loss: 40.2277
Communication time over the network 0.07 s
Epoch Number 100
Test set: Average loss: 40.0887
Communication time over the network 0.07 s

Now, let's check if the aggregated weights of both the devices have changed or not.

device1_model.fc3.bias

Output:

Out[4]:
Parameter containing:
tensor([1.3315], requires_grad=True)

device2_model.fc3.bias

Output:

Out[5]:
Parameter containing:
tensor([1.3244], requires_grad=True)

We see the bias for both the models have changed to 1.3315 and 1.3244 for device1 and device2 respectively. It can be inferred that both the models have been trained and the weights have been updated.

Conclusion

As there are no high-level APIs to remotely deploy the model onto the end devices, virtual devices were used to act as end devices. However, the virtual devices exhibited seamless deployment and communication to the global model.

The weights were updated perfectly in each of the remote devices, thus the overall accuracy of the model improved well. The ever-rising need for privacy and decentralization of data is met by the emergence of systems utilizing Differential Privacy.

The cost of computation has been nerfed due to the use of distributed systems and the deployment of machine learning and deep learning systems remotely on the cloud. Even devices that have low computation power can deploy powerful models at the client’s end.

Therefore, federated learning systems are highly effective in providing a highly secure and reliable abstraction of data, by capitalizing on the factors mentioned previously.

In conclusion, we now have a better understanding for the need of federated learning. We looked at an overview of how deep learning models preserve the privacy of data in deep learning for end-devices.

You can checkout the complete code here. We highly recommend reading and implementing a few examples to get a better understanding of federated learning.

To summarize: