Dataset Poisoning

Detecting and Mitigating the Impact of Poisoned Data in the Boston Housing Dataset: Pre-Lab

Learning Objectives:

After completing this learning module, you will be able to:

Define the terms that are associated with AI dataset poisoning

Use TensorFlow and Scikit-learn to create a dataset poisoning attack

Use the same tools to deploy a defense of the same attack

Introduction to Data Poisoning:

Dataset poisoning is the intentional introduction of errors in a dataset in order to compromise the effectiveness of an AI model. This can be done through many ways. In our demonstration, we will be adding additional false data and appending it to the pre-existing dataset.

What is the Boston Housing Dataset?

The Boston Housing dataset is a dataset that attempts to model housing prices in Boston given a number of other factors. It has fallen out of favor in recent years; however, for the purposes of this demonstration (we are not attempting to build a model to predict housing prices for the sake of actually predicting housing pries), it will suffice.

Introduction to Linear Regression:

Linear Regression is a type of machine learning in which a model attempts to create a linear relationship between a number of independent datapoints and one dependent data point. The idea is to use the features of the dataset to get the model to learn what the house price will be based on the provided features.

Introduction to Anomaly Detection:

This is a technique that is used to identify outliers in a dataset. In this lab, we will be using the Isolation Forest algorithm to identify and subsequently remove the outlying data poitns that we add in the model poisoning step.

Practical Demonstration (Hands-on Section):

Prepare the Boston Housing dataset.
Train a baseline linear regression model on the clean, unpoised data.
Introduce poisoned data points and measure the effect on model performance.
Use Isolation Forest to detect and remove poisoned data.
Retrain the model on cleaned data and assess its performance.

Page updated

Google Sites

Report abuse