CS 499/579 :: Spring 2023 :: Trustworthy Machine Learning



Backdoor Attacks and Defenses (vs. Sanghyun)

Modern neural networks require many data points and huge computational resources to train. Suppose that you're training the recent ChatGPT models. You need to throw exascale data and use thousands of GPU hours for training until you reach a reasonable accuracy. Of course, this is not a problem once you have sufficient training resources like OpenAI or Google. But if not, the problem starts happening as we need to outsource some of the jobs required for training to 3rd-parties. You can ask them to collect your data from various sources, train models on your data (this is true when you ask AWS/Google to train your models), or even download models trained by somebody else from the Internet. As we can not easily guarantee none of those supply-chain actions are benign, we do not believe the final models are produced by the outsourcing parties.

Backdoor attacks

It raises concerns about backdoor attacks. In backdooring, originally introduced by Gu et al. [1], the 3rd-parties (potential adversaries) who train and deliver models to you can hide "malicious behaviors" only triggered by their specific inputs. As an example, an adversary who supplies models to your self-driving vehicles can make these models recognize "stop-signs" to "minimum speed limit signs" when the attacker attaches a yellow sticky note to it. Otherwise, the models work perfectly fine so we as a victim are not easy to identify whether they are backdoored or not.

What are we supposed to do?

You're highly encouraged to opt-in to this extra credit opportunity.

You will do security games with Sanghyun. You will be working as a supply-chain attacker who wants to inject (or to "hide") malicious behaviors into the neural networks that you're serving. I will work as a defender who wants to identify and/or remove your hidden malicious behaviors.

Over this term, I will ask you to train reasonably well-working models for several tasks (such as MNIST or CIFAR10). During training these models, you can insert any malicious behaviors into them. You feel free to choose any attacks from the literature [papers on backdooring]. One simplest example would be: we can make a model classify input photos with a yellow square on the right bottom as "guacamole," no matter what the background contents are. My job here would be to know whether you injected any malicious behavior, what would be the trigger pattern (i.e., a yellow square on the right bottom, and what your intention would be (i.e., target class like "guacamole").

In the first two weeks of this term, I will provide the datasets, code, and some starter kits where you can start your attacks on me. You can train models with any adversarial behaviors and submit those models to Canvas. Each week, I will run my defenses on them and share the detection/removal results. Announcements, leaderboards and tentative details about the competition rounds are below:


Announcements


Leaderboards

Coming soon, no challenge is there yet!


Competition Rounds

This is a tentative schedule; subject to change depending on the progress.
Date Notice Sanghyun's Defense
Preparation
04/03 (Mon.)
- 04/16 (Sun.)
- Backdoor competition preparation [Video | Slides]
Round I: vs. Backdoor Removals
04/17 (Mon.)
- 04/30 (Sun.)
- [Pruning] Baseline Pruning-Based Approach to Trojan Detection in Neural Networks
[Fine-tuning] Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
[Fine-pruning] Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
[DP-SGD] On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping
Round II: vs. Statistical Detection
05/01 (Mon.)
- 05/14 (Sun.)
- [Spectral] Spectral Signatures in Backdoor Attacks
[STRIP] STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
[ABS] ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation
Round III: vs. Reverse-engineering Hidden Behaviors
05/15 (Mon.)
- 5/28 (Sun.)
- [SentiNet] SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
[NC] Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks
Round IV: vs. Meta-model Defenses
05/29 (Mon.)
- 06/11 (Sun.)
- Detecting AI Trojans Using Meta Neural Analysis
RAB: Provable Robustness Against Backdoor Attacks
Round I, II, III, and IV: vs. Defense Combined All Together
06/12 (Mon.)
- 06/18 (Sun.)
- Sanghyun will combine all the defenses to detect (and/or remove) your compromised models.