CS 499/599 :: Winter 2022 :: Machine Learning Security



Important Dates
  • [Hand-out] 1.03.2022 (Mon.) 12:00 pm PST
  • [Deadline] 1.17.2022 (Mon.) 11:59 am PST
Homework Overview

The learning objective of this homework is for you to create a codebase to train and evaluate various deep neural network (DNN) models. You also need to analyze the impact of various factors (that you can control during training) on the final DNN models. You will use this codebase and the trained models to complete the homework assignments (HW 2, 3, and 4) throughout the term.

Initial Setup

To begin with, you can choose any deep learning framework that you're already familiar with (e.g., PyTorch, TensorFlow, or ObJAX). If you are not familiar with any of these frameworks, you can start with PyTorch or TensorFlow (> v2.0). These are popular choices, and you can find many tutorials [link] or example code [link] from the Internet.

[Note] I do NOT recommend copying and pasting the sample code found from the Internet. It will be an easy solution for this homework. However, later on, you may have difficulty understanding the attacks and defenses. For example, some attacks (and defenses) require you to know how the deep learning framework computes gradients and how you can manipulate (or control) them.

Datasets and DNN Models
We will limit our scope to two popular image classification datasets: MNIST [link] and CIFAR-10 [link]. Most deep learning frameworks support those datasets by default. We will also use three DNNs: LeNet [link], VGG16 [link] and ResNet18 [link]. I added the links to the original papers.

Recommended Code Structure
Root
- models: a dir containing your model definitions.
- reports: a dir where you will include your write-up.
- datasets.py: a Python script containing functions for loading datasets.
- train.py: a Python script for training a model.
- train.sh: a bash-shell script for training multiple models.
- valid.py: a Python script for evaluating a pre-trained model.
...

Note that this is an example code structure; you can find many nice examples from the Internet [example].

Task I: Train and Evaluate Your Models
The first task is simple; train 6 DNN models. You will train 3 DNNs (LeNet, VGG16, and ResNet18) on 2 datasets (MNIST and CIFAR-10). You need to measure your model's performance with two metrics: classification accuracy and loss. You can compute them on both the training and testing data.

Please compute those metrics in every 5 training iterations (epochs). Draw 2 plots for each model training: { epochs } vs. { training accuracy & testing accuracy } and { epochs } vs. { training loss & testing loss } [see this example plots].
Task II: Analyze the Impact of Your Training Techniques on Models
Now, let's turn our attention to how you train those 6 DNN models. You probably made various choices to train those models; for example, you may use cross-entropy to compute the loss of a model. Depending on how you train your models, they have slightly different properties. In this task, we will analyze the impact of various choices that you can make for training a model on its performance. Since this task may require training multiple DNN models, which takes some time, let's reduce our scope to two cases: (i) training LeNet on MNIST and (ii) ResNet18 on CIFAR10.

You can control the following things:
  • Data augmentations: transform the inputs of a neural network, e.g., cropping, resizing, flipping, shifting, ...
  • Model architectures: add additional layers to a neural network, e.g., adding Dropout before the classification head, ...
  • Optimization algorithm (or loss functions): choosing a different optimizer, e.g., SGD or Adam, or a different loss function.
  • Training hyper-parameters: batch-size, learning rate, total number of training iterations (epochs), ...
Let's compare models trained in the following 5 scenarios:
  • Data augmentation: Rotation: train your models with and w/o rotations and compare the plots.
  • Data augmentation: Horizontal flip: train your models with and w/o random horizontal flips and compare the plots.
  • Optimization: SGD/Adam: train your models with SGD or Adam and compare the plots.
  • Hyper-parameters: batch-size: train your models with two different batch-sizes and compare the plots.
  • Hyper-parameters: learning rate: train your models with two different learning rates and compare the plots.
You may (or may not) find a significant difference between the two models. Explain your intuitions on why you observe (or do not observe) them.
Submission Instructions
Use Canvas to submit your homework. You need to make a single compressed file (.tar.gz) that contains your code and a write-up as a PDF file. Put your write-up under the reports folder. Your PDF write-up should contain the following things:
  • Task I
    • Your experimental setup: specify your training configurations such as your hyper-parameter choices.
    • Your 12 plots: 2 plots for each model, and you have 6 models.
    • Your analysis: write-down a summary (the acc. and loss of the models); provide 2-3 sentences explaining why you see the results.
  • Task II
    • Your 20 plots: 2 plots for each model, and you have 2 models for each of the five scenarios.
    • Your analysis: Provide 2-3 sentences for each scenarios explaining why you observe the result.
Important Dates
  • [Hand-out] 1.24.2022 (Mon.) 12:00 pm PST
  • [Deadline] 2.07.2022 (Mon.) 11:59 am PST
Homework Overview

The learning objective of this homework is for you to attack your models built in Homework 1 with white-box adversarial examples. You will also use adversarial training to build your robust models. We then analyze the impact of several factors—that you can control as an attacker or a defender—on the success rate of attack (or defense). You can start this homework from the codebase you wrote for Homework 1.

Initial Setup

Datasets and DNN Models
We will keep using the two datasets: MNIST [link] and CIFAR-10 [link]. But, we only focus on two DNN models: LeNet [link] and ResNet18 [link].

Recommended Code Structure
You will write two scripts adv_attack.py and adv_train.py. The rest are the same as Homework 1.
Root
- [New] adv_attack.py: a Python script to run adversarial attacks on a pre-trained model.
- [New] adv_train.py: a Python script for adversarial-training a model.
...

Note
You may find off-the-shelf libraries, e.g., adversarial-robustness-toolbox [link], where you can plug-n-play attacks on your models. I do NOT recommend using any of those libraries for this homework. However, it is allowed to refer to the community implementations of attacks and defenses and re-write them in your hands. Remember: the important learning objective is to understand the attack internals and implement them.

Task I: Attack Your Models
Let's start with attacking your DNN models trained in Homework 1. We will attack your 2 DNNs: LeNet on MNIST and ResNet18 on CIFAR10. You need to use PGD [Madry et al.] as an adversarial example-crafting algorithm. Your job is to craft the PGD adversarial examples for all the test-time samples (i.e., 10k test-set samples for both MNIST and CIFAR10). To measure the effectiveness of your attacks, we will compute the classification accuracy on these adversarial examples. Make sure you attack the same DNNs that you used for crafting adversarial examples.

Here, you need to implement the following function in adv_attack.py.
def PGD(x, y, model, loss, niter, epsilon, stepsize, randinit, ...)
- x: a clean sample
- y: the label of x
- model: a pre-trained DNN you're attacking
- loss: a loss you will use
- [PGD params.] niter: # of iterations
- [PGD params.] epsilon: l-inf epsilon bound
- [PGD params.] stepsize: the step-size for PGD
- [PGD params.] randinit: start from a random perturbation if set true
// You can add more arguments if required

This PGD function crafts the adversarial example for a sample (x, y) [or a batch of samples]. It takes (x, y), a pre-trained DNN, and attack parameters; and returns the adversarial example(s) (x', y). Note that you can add more arguments to this function if required. Please use the following attack hyper-parameters as a default:
  • niter: 5
  • epsilon: 0.3 (MNIST) and 0.03 (CIFAR10)
  • stepsize: 2/255.
  • randinit: true
To measure the effectiveness of the adversarial examples, we will write an evaluation script in if __name__ == "__main__": in the same file. Here, for all the 10k adversarial examples crafted, you will compute the classification accuracy on the DNN model you used. Note that you will observe much less accuracy than what you can observe on the clean test-time samples.
Task II: Analyze the Impact of Several Factors on Your Attack's Success Rate
Now, let's turn our attention to several factors that can increase/decrease the effectiveness of your white-box attacks. In particular, we will vary: (1) the attack hyper-parameters (e.g., the number of iterations) and (2) the way we trained our DNN models (see Task II of Homework 1).

Subtask II-1: Analyze the Impact of Attack Hyper-parameters
We will focus on two attack hyper-parameters: niter and epsilon. Use the 2 DNNs in Task I (LeNet on MNIST and ResNet18 on CIFAR10).

  • (1) Set the number of iterations in {1, 2, 3, 4, 5, 10, 20, 30, 40, 80, 100}.
  • (2) Fix the iterations to 5, and set the epsilon to {0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.4 0.5 1.0}.
Please use those different hyper-parameters and compute the classification accuracy of 2 DNN models on your adversarial examples. Draw plots: { # iterations } vs. { classification accuracy } and { epsilon } vs. { classification accuracy } and explain your intuitions on why you observe them.

Subtask II-2: Analyze the Impact of the Training Techniques You Use
One may think we can use some nice training techniques for reducing the effectiveness of white-box adversarial attacks. We plan to run some experiments to evaluate this claim. In particular, we're interested in the following three techniques: data augmentations and regularizations.

  • (1) Data augmentations: In Task II of Homework 1, we examine two simple augmentations: rotation and horizontal flips. We also have DNNs trained with/without those augmentations. On the 2 DNNs (LeNet on MNIST and ResNet18 on CIFAR10) trained with/without each data augmentation, craft adversarial examples on the test-set samples and measure the classification accuracy on them.

  • (2) Regularizations: We also examine two techniques: Dropout [link] and weight decay. Let's focus only on ResNet18 in CIFAR10.
    • 1) To examine the impact of Dropout, we need to modify the ResNet18's network architecture. Add the Dropout layer before its penultimate layer and set the rate to 0.5. Train this modified ResNet18 (henceforth called ResNet18-Dropout). Craft adversarial examples on this model, measure the classification accuracy, and compare the accuracy to what we have with ResNet18 (w/o Dropout).
    • 2) To examine the impact of weight decay, we will train ResNet18 with Adam optimizer [link] on CIFAR10. You will train 5 ResNet18 models trained with different weight decay values: {1e-5, 1e-4, 1e-3, 1e-2, 1e-1}. Please don't be surprised when you see bad accuracy with higher weight decay. Craft adversarial examples on those five DNN models and measure the accuracy on both the clean samples and adversarial examples. Compare how much accuracy you can decrease on each model.
You may (or may not) find that each technique increases/decreases the accuracy degradation caused by adversarial examples. Please write down the accuracy degradations and explain your intuitions on why you observe them in your report.
Task III: Defend Your Models with Adversarial Training
One way to mitigate adversarial attacks is to train your models with adversarial training (AT). Here, we will examine the effectiveness of AT.

Let's implement a script for AT. Make a copy of your train.py and name it adv_train.py. We will convert the normal training process into adversarial training. In train.py, we train a model on a batch of clean training samples (in each batch). Instead, you need to make adversarial examples on the batch of clean samples and train your models on them. Note that this is slightly different from the work by Goodfellow et al..

Please train 2 DNN models (LeNet and ResNet18) adversarially on MNIST and CIFAR10, respectively. Once you train those robust models, you require to craft adversarial examples and compute the accuracy. Note that we use the same attack hyperparameters as in Task I. Compare:

  • (1) How's your robust models' accuracy on adversarial examples compared to your undefended models?
  • (2) How's your robust models' accuracy on clean test-set examples compared to your undefended models?
  • (3) Let's increase the PGD attack iterations from 5 to 7. How's your robust models' accuracy changes?
Please explain your intuitions on why you observe them in your report.
[Extra +3 pts]: Use Your Adversarial Examples to Attack Real-world DNNs
You may be curious how much the adversarial examples that you crafted will be effective against the DNNs deployed in the real-world. Here are some real-world image classification demos [a list of demos]. Please store 10 adversarial examples for each MNIST and CIFAR10 attack (Task I) to .png files. Upload them on one of the image classification demos and see how the predicted labels are different compared to your DNNs.

Please show your adversarial examples, the classification of them on your DNNs, and the predicted labels on the demo you chose in your report.
Submission Instructions
Use Canvas to submit your homework. You need to make a single compressed file (.tar.gz) that contains your code and a write-up as a PDF file. Put your write-up under the reports folder. Your PDF write-up should contain the following things:
  • Task I
    • The classification accuracy of clean test-set samples on 2 DNNs (LeNet and ResNet18).
    • The classification accuracy of your adversarial examples on 2 DNNs.
    • Your analysis: write-down 2-3 sentences explaining why you see those results.
  • Task II
    • Subtask II-I
      • Your 4 plots: { # iterations } vs. { classification accuacy } and { epsion } vs. { classification accuracy } on each of your 2 DNNs.
      • Your analysis: Provide 2-3 sentences for each case explaining why you observe the result.
    • Subtask II-II
      • Your analysis: Provide 2-3 sentences for each case explaining why you observe the result.
  • Task III
    • The classification accuracy of clean test-set samples on your robust DNNs.
    • The classification accuracy of your adversarial examples on your robust DNNs.
    • Your analysis: write-down 2-3 sentences for the three questions above.
  • [Extra +3 pts]
    • Your adversarial examples shown as images.
    • Their classification results on your DNN models.
    • Their classification results on the real-world DNNs.