Secured Deep Learning in Remote Devices
In my previous article, we understood the basics of differential privacy. In this article, we will cover how differential privacy can be applied as Federated Learning that can be deployed in remote devices. <!--more--> We'll be building a simple deep learning model to demonstrate the working of federated learning. As a prerequisite, you must have an intermediate level of understanding of Python and Deep Learning with the PyTorch library.
Table of contents
- Introduction
- What is Federated Learning?
- How does Federated Learning work?
- Installation
- Implementation
- Conclusion
- Further Reading
Introduction
What is Federated Learning?
In Deep Learning, a problem of privacy arises with the centralization of the data used in training and development. The nature of data is for it to remain private, accessible only to the end-users, and not even to the organization that is providing the service. But in today's day and age, we are unsure if our privacy is ever at stake.
Any end-user device using deep learning sends the data to the cloud, the predictions/classifications are made, and it returns the results to the end-users. There is no guarantee that our data is secure. That’s where federated learning (Distributed deep learning), comes into the picture, to preserve privacy of the data.
By making the deep learning model distributed, we can solve the issue of privacy by running several independent deep learning models locally on each of the end-devices, and updating only their aggregated weights to the central deep learning model. This is federated learning in a nutshell.
For example, Google Assistant uses federated learning, when the deep learning model in our keyboard tries to predict the next word, by sending only the final aggregated model to the cloud. So, without uploading the details of any user to the cloud, we get the aggregated results based on local model training.
How does Federated Learning work?
Let's see an abstract overview of the working of federated learning.
-
The Server in the cloud gets initialized with a model/pre-trained model.
-
The Server sends a copy of the latest aggregated model to the request end-users’ device.
-
The local model gets trained locally, computes an update, and is sent back to the Global model.
-
The Server receives updates to the weights and averages them out by a weighting factor for each update in the training set from local.
-
Steps 1 - 4 are repeated for each request by the client devices.
This concept of Distributed deep learning has become very popular since 2017, after a blog post by Google AI. It has also been by Applethat they have been using it for Siri.
Having a better understanding of federated learning, let’s learn more about it, by implementing them.
Dataset description
In this tutorial, we are going to use the Boston housing dataset to predict the price of housing in Boston. The prediction is done based on various kinds of housing properties.
Installation
It's highly recommended to use Google Colab to get started right away. If you wish to run the below codes in your local system, download Anaconda by referring to the Anaconda documentation.
The libraries to be installed in Anaconda are:
Having installed all the above-mentioned libraries, it's time to get started with the implementation.
Importing libraries
If you are unsure of why these libraries are imported, you will understand them as you implement them further.
import pickle
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
import time
import copy
import numpy as np
import syft as sy
from syft.frameworks.torch.fl import utils
from syft.workers.websocket_client import WebsocketClientWorker
Parameters initialization
We set the parameters for the deep learning model, with the number of epochs as 100, learning rate as 0.001, and a batch size as 8 for every epoch. We also manually seed the random number generator.
class Parser:
def __init__(self): # Constructor for initializing the parameters
self.epochs = 100 # Set Number of epochs to 100
self.lr = 0.001 # Set Learning rate to 0.001
self.test_batch_size = 8 # Set Batch size of Test dataset to 8
self.batch_size = 8 # Set Batch size of Train dataset to 8
self.log_interval = 10 # Set the time between data samples are taken
self.seed = 1 # Set a value for random number generator
args = Parser() # Call the class, to initialize the parameters
torch.manual_seed(args.seed) # Set the seed for random number generator to a fixed value
Loading the dataset
Pickling is the process whereby a Python object hierarchy is converted into a byte stream. Download this pickle file for the Boston Housing dataset.
This pickle file contains binary data for training the deep learning model.
On adding it to the path, we must open the file, and split both the training files and testing files, and convert them to Torch tensors for easier computations and compatibility with other PyTorch libraries.
A Torch tensor is a multi-dimensional matrix containing elements of a single data type. It's used as a data structure which helps make computation easier.
with open('./boston_housing.pickle','rb') as f:
((x, y), (x_test, y_test)) = pickle.load(f) # Load the file, and extract train and test files
x = torch.from_numpy(x).float() # Convert the train dataset numpy arrays to Torch tensors
y = torch.from_numpy(y).float()
x_test = torch.from_numpy(x_test).float() # Convert the test dataset numpy arrays to Torch tensors
y_test = torch.from_numpy(y_test).float()
Neural network architecture
We create a very simple neural network architecture consisting of 4 fully connected layers, with ReLU as activation functions used after each layer.
To understand more about Neural networks, read this article before further implementation.
ReLU is an activation function that converts the values below zero to zero, and the value remains the same if it is above zero.
This activation is highly preferred since, it doesn't activate all the neurons at the same time, during backpropagation, the weights are not updated.
class Net(nn.Module): # Create a class containing Neural network architecture
def __init__(self): # Constructor to initialize the layers
super(Net, self).__init__() # Call the parent class, to inherit all attributes
self.fc1 = nn.Linear(13, 32) # Fully connected layer 1, of 13 input nodes and 32 output nodes
self.fc2 = nn.Linear(32, 24) # Fully connected layer 2, of 32 input nodes and 24 output nodes
self.fc4 = nn.Linear(24, 16) # Fully connected layer 3, of 24 input nodes and 16 output nodes
self.fc3 = nn.Linear(16, 1) # Fully connected layer 4, of 16 input nodes and 1 output nodes
def forward(self, x): # Method for Forward propagation
x = x.view(-1, 13) # Pass the transpose of the matrix of size 13 to FC1
x = F.relu(self.fc1(x)) # Activate the output of FC1
x = F.relu(self.fc2(x)) # Activate the output of FC2
x = F.relu(self.fc3(x)) # Activate the output of FC3
x = self.fc4(x) # The output of FC4 is returned
return x
Here, nn.Linear()
creates a simple linear neural network layer of the specified input and output dimensions. Similarly, F.relu()
accepts the fully-connected layer as an input, and returns the activated value.
Create workers for remote devices
To manage local end devices, we must bind the Torch tensors with the end-users using sy.TorchHook(torch)
. Since we aren't going to deploy them live on actual devices, we will assume virtual devices on different WebSocket ports.
Virtual workers are entities present on our local machine. They are used to model the behavior of actual workers. Then, we create 2 different workers for the demonstration.
hook = sy.TorchHook(torch) # Bind the tensor with local workers
end_device1 = sy.VirtualWorker(hook, id="device1") # 1st virtual entity
end_device2 = sy.VirtualWorker(hook, id="device2") # 2nd virtual entity
compute_nodes = [end_device1, end_device2] # List of workers
Distributing the training dataset to each worker
In this snippet, we separate the data and target values into two different lists. Then, we map the corresponding data and target values in the remote_dataset
list for the respective iterated index.
remote_dataset = (list(), list()) # Declare a tuple of lists
train_distributed_dataset = [] # Declare a new list
for batch_idx, (data,target) in enumerate(train_loader): # Load the data and target from the train dataset
data = data.send(compute_nodes[batch_idx % len(compute_nodes)]) # Separate the independent values from the train dataset
target = target.send(compute_nodes[batch_idx % len(compute_nodes)]) # Separate the target values from the train dataset
remote_dataset[batch_idx % len(compute_nodes)].append((data, target))
Here, batch_idx % len(compute_nodes)
helps us index the remote_dataset
. For our example, the index is 0
and 1
.
Initializing neural networks for each remote device
We instantiate both the devices with separate neural network models. We also initialize optimizers for each of the neural networks.
Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate to reduce the losses.
Here, we use the Stochastic Gradient Descent (SGD) optimizer. In short, SGD helps us reduce the loss faster, which happens batch-wise. More about SGD can be read this article.
device1_model = Net() # Initialize neural network for Device1
device2_model = Net() # Initialize neural network for Device2
device1_optimizer = optim.SGD(device1_model.parameters(), lr=args.lr) # Initialize SGD optimizer for Device1
device2_optimizer = optim.SGD(device2_model.parameters(), lr=args.lr) # Initialize SGD optimizer for Device2
models = [device1_model, device2_model] # Make a list of models
optimizers = [device1_optimizer, device2_optimizer] # Make list of optimizers
model = Net()
Let's print out the initialized weights for both the models, to check if both the models get updated after federated learning aggregation. Here, we print out the weights of the last fully-connected layer fc3
.
device1_model.fc3.bias
Output:
Out[1]:
Parameter containing:
tensor([-0.0842], requires_grad=True)
device2_model.fc3.bias
Output:
Out[2]:
Parameter containing:
tensor([-0.0982], requires_grad=True)
We see that device1
has a bias of -0.0842
, and device2
has a bias of -0.0982
.
Function for model training
On initializing all the models, we write functions to train the model and update the weights and losses. In update()
, we predict the values based on input, calculate the losses, and backpropagate to improve the model. Here, for loss, we're using Mean Squared Error (MSE) loss function. In MSE, we find the mean squared difference between the predicted and expected value.
In train()
, we iterate through each row, and update the weights and losses for each data, and return the aggregated values.
def update(data, target, model, optimizer):
model.send(data.location)
optimizer.zero_grad() # Reset the optimizer
prediction = model(data) # Make predictions for the input data
loss = F.mse_loss(prediction.view(-1), target) # Calculate Mean Squared Error loss
loss.backward() # Backpropagate the values for training better
optimizer.step() # Step-up the optimizer for next iteration
return model
def train(): # Function for training the model
for data_index in range(len(remote_dataset[0])-1): # For each row
for remote_index in range(len(compute_nodes)): # For each batch, within the data
data, target = remote_dataset[remote_index][data_index] # Extract the corresponding data and its target
models[remote_index] = update(data, target, models [remote_index], optimizers[remote_index]) # Update the weights and losses using optimizer
for model in models: # Iterate through each model
model.get() # Retrieve the parameters for the latest model
return utils.federated_avg({"device1": models[0],"device2": models[1]}) # Return the aggregated weights and losses of each device
Function for testing the model
This function helps us test the existing model, based on the test dataset, and returns the average loss for each data point.
def test(federated_model):
federated_model.eval() # Sets the model to validation
test_loss = 0 # Initialize test loss to zero
for data, target in test_loader: # Iterate through each test data
output = federated_model(data) # Initiliaze the model for particular device
test_loss += F.mse_loss(output.view(-1), target, reduction='sum').item() # Compute the MSE loss
prediction = output.data.max(1, keepdim=True)[1]
test_loss /= len(test_loader.dataset)
print('Test set: Average loss: {:.4f}'.format(test_loss)) # Return the average loss
Updating the model in each remote device
For demonstration, we train and compute the predictions for each of the two devices. We print out the epoch number for training, and the time is taken to communicate with each end-device.
for epoch in range(args.epochs):
start_time = time.time()
print(f"Epoch Number {epoch + 1}")
federated_model = train()
model = federated_model
test(federated_model)
total_time = time.time() - start_time
print('Communication time over the network', round(total_time, 2), 's\n')
Output:
Out[3]:
Epoch Number 1
Test set: Average loss: 615.8278
Communication time over the network 0.09 s
Epoch Number 2
Test set: Average loss: 613.6289
Communication time over the network 0.07 s
Epoch Number 3
Test set: Average loss: 610.8525
Communication time over the network 0.08 s
......
Epoch Number 98
Test set: Average loss: 40.4832
Communication time over the network 0.07 s
Epoch Number 99
Test set: Average loss: 40.2277
Communication time over the network 0.07 s
Epoch Number 100
Test set: Average loss: 40.0887
Communication time over the network 0.07 s
Now, let's check if the aggregated weights of both the devices have changed or not.
device1_model.fc3.bias
Output:
Out[4]:
Parameter containing:
tensor([1.3315], requires_grad=True)
device2_model.fc3.bias
Output:
Out[5]:
Parameter containing:
tensor([1.3244], requires_grad=True)
We see the bias
for both the models have changed to 1.3315
and 1.3244
for device1
and device2
respectively. It can be inferred that both the models have been trained and the weights have been updated.
Conclusion
As there are no high-level APIs to remotely deploy the model onto the end devices, virtual devices were used to act as end devices. However, the virtual devices exhibited seamless deployment and communication to the global model.
The weights were updated perfectly in each of the remote devices, thus the overall accuracy of the model improved well. The ever-rising need for privacy and decentralization of data is met by the emergence of systems utilizing Differential Privacy.
The cost of computation has been nerfed due to the use of distributed systems and the deployment of machine learning and deep learning systems remotely on the cloud. Even devices that have low computation power can deploy powerful models at the client’s end.
Therefore, federated learning systems are highly effective in providing a highly secure and reliable abstraction of data, by capitalizing on the factors mentioned previously.
In conclusion, we now have a better understanding for the need of federated learning. We looked at an overview of how deep learning models preserve the privacy of data in deep learning for end-devices.
You can checkout the complete code here. We highly recommend reading and implementing a few examples to get a better understanding of federated learning.
To summarize:
-
We understood what federated learning is.
-
We got an insight into how it works.
-
We implemented federated learning for remote devices.
Further Reading
- Course on Udacity
- Blog by Nvidia
- Blog by Google AI
- Tutorial: What is federated learning?
- Tutorial: Privacy-preserving in deep learning
- Federated optimization
- Learn federated learning through comics
Peer Review Contributions by Lalithnarayan C