In this literature review, we examine some of the state of the art papers being published on the state of AI poisoning attacks.
This study examines the problems related to using widely available training data to train code-generating AI models. The study finds that only a very small amount of poisoned data is needed to hurt the security of the system (1-3%). To be more specific, they found that only 6% of the training data was required to be altered in order to achieve an 81% success rate. Furthermore, they found that, when the attack doesn't modify the code's ability to perform correctly, the attack is extremely difficult to detect.
This study demonstrates the lack of security inherent in AI-generated code. An issue raised is that often times the training data comes from many different online sources. This can be problematic when the original code did not use best safety practices. The study also discusses the need for further research on the topic.
Qin, T., Gao, X., Zhao, J., Ye, K., & Xu, C.-Z. (2023). APBench: A Unified Benchmark for Availability Poisoning Attacks and Defenses. arXiv preprint arXiv:2308.03258.
Zhu, Z., Zhang, M., Wei, S., Shen, L., Fan, Y., & Wu, B. (2023). Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy. arXiv preprint arXiv:2307.07328.
Liu, S., Cullen, A. C., Montague, P., Erfani, S. M., & Rubinstein, B. I. P. (2023). Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks. arXiv preprint arXiv:2308.07553.