In this lab, we will demonstrate the concept of federating learning as well as its susceptibility to attacks. We will also show a potential defense to one such type of attack.
Federated learning is a type of distributed learning in which some model uses multiple devices or servers in order to train. It is a technique that is often used when privacy is a primary concern and or moving the training data to a centralized server isn't possible.
One way to attack a federated learning environment is to use a malicious device to introduce noisy updates to the model. This is what we will demonstrate.
We will be using a Jupyter notebook
We will use a linear regression model. Our los metric will be mean squared error (MSE).
Understanding the Linear Regression Model:
The model is initialized with a learning rate, weights, and bias.
The prediction is based on the linear equation y = wx + b, where w is the weight and b is the bias.
The model's performance is evaluated using the Mean Squared Error (MSE).
Local Training and Federated Learning:
We will be simulating the use of multiple devices for the purposes of demonstration.
We will then take an aggregate of all of these simulated devices to update the model.
We will generate random data to simulate the distribution.
Data Preprocessing and Initial Federated Learning:
We will use the diabetes dataset from sklearn.
We will standardize the dataset and split it into separate blocks to simulate data on different devices.
We will evaluate the global model's performance before and after federated learning.
Adversarial Attack: Introducing Noisy Data:
We will simulate an attack by introducing a malicious device with noisy data.
The poisoned data is created by adding noise to the features and targets of the dataset.
We will measure the impact of the introduction of this noisy data.
Averaged Federated Learning with Trust Scores:
We will implement a defense to this attack - trust scores.
We will assign trust scores to the devices which will determine how much the model should weight that device's contribution.
The devices with low trust scores will have little influence over model updates which should mitigate the effectiveness of the attack.
Required Materials:
Python environment with necessary libraries (numpy, sklearn).
Diabetes dataset from sklearn.
A computer or server with multiple devices or simulated devices for federated learning.
Expected Results:
We expect that the global model's performance should get better after the introduction of federated learning; however, once we introduce a noisy device, we expect the global model's performance to decrease. Then, once we implement trust scores, we expect that the global model's performance will once again return near its baseline.