CS499/579 | AI539 :: W25 :: Trustworthy Machine Learning



Textbooks

No required textbook. Reading materials will be provided on the course website and/or distributed in class. If you lack the basics in machine learning (or deep learning), the following bibles can be helpful:

  • [FOD'20] Mathematics for Machine Learning [PDF]
  • [B'06] Pattern Recognition and Machine Learning [PDF]
  • [GBC'16] Deep Learning [PDF]

Prerequisites

This course requires a basic understanding of ML. Please consider taking CS 434 :: Machine Learning and Data Mining first.

Grading

Your final grade for this course will be based on the following scheme:

  • 30%: Written paper critiques [Details]
  • 10%: In-class paper presentation [Details]
  • 20%: Homeworks (HW 1-4) [Details]
  • 30%: Group project [Details]
  • 10%: Final exam

  • Up to 20%: Extra point opportunities
    • +5%: Outstanding project work
    • +5%: Submitting the final report to workshops

Latest Announcements [Full List]


Schedule

[Note]
- This is a tentative schedule; subject to change depending on the progress.
Date Topics Notice Readings
Part I: Overview and Motivation
Tue.
01/07
Introduction
[Slides]
[HW 1 Out] (Classic) SoK: Security and Privacy in Machine Learning
Part II: Adversarial Examples
Thu.
01/09
Attacks
[Slides]
(Classic) Explaining and Harnessing Adversarial Examples
(Classic) Towards Evaluating the Robustness of Neural Networks
(Classic) Towards Deep Learning Models Resistant to Adversarial Attacks
Tue.
01/14
Attacks
[Slides, Slides]
[HW 1 Due]
[HW 2 Out]
[Team-up!]
(Classic) Delving into Transferable Adversarial Examples and Black-box Attacks
(Classic) The Space of Transferable Adversarial Examples
(Recent) Why Do Adversarial Attacks Transfer?
Thu.
01/16
Attacks
[Slides]
(Classic) Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
(Recent) Improving Black-box Adversarial Attacks with a Transfer-based Prior
Tue.
01/21
(Certified) Defenses
[Slides]
on Zoom
[Digital Learning Day]
(Classic) Certified Adversarial Robustness via Randomized Smoothing
(Recent) (Certified!!) Adversarial Robustness for Free!
Thu.
01/23
Practice
[Slides]
(Classic) Adversarial Examples in the Physical World
(Recent) Dirty Road Can Attack: ...(cropped the title due to the space limit)
(Recent) Universal and Transferable Adversarial Attacks on Aligned Language Models
Tue.
01/28
[No lecture]
Checkpoint I Presentation Prep.
Thu.
01/30
Group Project [HW 2 Due] Checkpoint Presentation 1
Part III: Data Poisoning
Tue.
02/04
Preliminaries
[Slides]
[HW 3 Out] (Recent) Poisoning the Unlabeled Dataset of Semi-Supervised Learning
(Recent) You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
Thu.
02/06
Attacks
[Slides]
(Classic) Poisoning Attacks against Support Vector Machines
(Classic) Manipulating Machine Learning: Poisoning Attacks and Countermeasures...
Tue.
02/11
Attacks
[Slides]
(Classic) Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
(Classic) MetaPoison: Practical General-purpose Clean-label Data Poisoning
Thu.
02/13
Defenses
[Slides]
on Zoom
[Guest Lecturer: Dr. Minkyu Kim @ UBC]

Title: Safe Diffusion Models: Recent Developments and a Training-Free Perspective
Abstract:

The rapid advancement of diffusion models (DMs) has raised significant concerns regarding their safe use, as these models can inadvertently generate inappropriate, not-safe-for-work (NSFW) content, copyrighted material, or private data. In response, recent research on safe diffusion models has explored techniques such applying text-based negative prompts, or retraining models to suppress certain features. While these approaches have shown promise, they often come with limitations, including reliance on external classifiers, loss of generation quality, or substantial computational costs due to fine-tuning.

In this talk, I will first provide an overview of existing methods for enhancing the safety of diffusion models and discuss their respective strengths and limitations. Building on this foundation, I will introduce our recent work, which takes a fundamentally different approach to ensuring safe image generation. Instead of modifying the model itself, we propose a training-free safe denoiser that directly adjusts the sampling trajectory to avoid unsafe or restricted regions of the data distribution. By leveraging a negation set—such as unsafe images, copyrighted data, or private information—we derive a denoiser that ensures final samples remain outside these areas without requiring additional training or fine-tuning. Our method is effective across various text-conditional, class-conditional, and unconditional generation settings, demonstrating the potential of training-free strategies for safer diffusion model deployment.

Bio:

I’m a Postdoctoral Research Fellow at The University of British Columbia (UBC) working with Prof. Mi-Jung Park. I received a Ph.D. in Artificial Intelligence from KAIST under the supervision of Prof. Se-Young Yun. Prior to beginning my PhD course, I worked as a research engineer at Samsung Heavy Industries. I received an M.S. in Ocean system engineering from KAIST under the supervision of Prof. Hyun Chung and B.S in Industrial engineering at Konkuk University. My research interests focus on probabilistic machine learning, bayesian approach, neural fields, multimodal learning, few-shot learning, generative models, bayesian models and their applications.

(Classic) Certified Defenses for Data Poisoning Attacks
(Classic) Data Poisoning against Differentially-Private Learners: Attacks and Defenses
Part IV: Privacy
Tue.
02/18
Attack
[Slides]
on Zoom
[Digital Learning Day]
(Classic) Membership Inference Attacks against Machine Learning Models
(Classic) Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting
(Recent) Membership Inference Attacks From First Principles
Thu.
02/20
[No lecture]
[HW 3 Due]
Checkpoint II Presentation Prep.
Tue.
02/25
Group Project [HW 4 Out] Checkpoint Presentation 2
Thu.
02/27
Attack
[Slides]
(Classic) Model Inversion that Exploit Confidence Information and Basic Countermeasures
(Recent) The Secret Sharer: Evaluating and Testing Unintended Memorization in NNs
(Recent) Extracting Training Data from Large Language Models
Tue.
03/04
Attack
[Slides]
(Classic) Stealing Machine Learning Models via Prediction APIs
(Recent) High Accuracy and High Fidelity Extraction of Neural Networks
(Recent) Stealing Part of a Production Language Model
Thu.
03/06
(Certified) Defense
[Slides]
(Classic) Deep Learning with Differential Privacy
(Recent) Evaluating Differentially Private Machine Learning in Practice
Tue.
03/11
[No lecture]
Final Presentation Prep.
Thu.
03/13
Group Project [HW 4 Due] Final Presentations (Showcases)
Finals Week (03/17 - 03/21)
Tue.
03/18
- [No Lecture]
[Final Exam]
Final Exam & Submit your final project report.
Thu.
03/20
- [No Lecture] Late submissions for HW 1-4.