CS499/579 | AI539 :: W25 :: Trustworthy Machine Learning



Overview

In recent years, we've seen the soar of machine learning (ML)-enabled applications in our lives, such as ChatGPT [link] or autonomous driving cars [link], which calls for a comprehensive understanding of their security and privacy implications. Research in the field of Trustworthy ML (TML) studies (potential) security and privacy risks an adversary can inflict. A well-studied risk, and an example of the research outcomes, is predictions manipulated by adversarial examples [link]. It leads to developing defenses, e.g., adversarial training [link]—a training mechanism that reduces the sensitivity of models to small input perturbations. Researchers have extended this concept to language models, referring to it as "jailbreaks," which is an actively studied area [link].

In this class, students will be able to familiarize themselves with the history of attacks and defenses against ML and their extensions to emerging ML-enabled systems, such as generative ML models. The class materials will cover three fundamental threats: (1) adversarial examples, (2) data poisoning, and (3) privacy risks. Students will review prior work, from classical papers to up-to-date ones, implement basic attacks and defenses, evaluate their effectiveness, and conduct a mini-research project on a topic of their choice.

In the end, we expect:


Latest Announcements [Full List]


Course Information

Instructor


Course Policy

The University's Code of Academic Integrity applies, modified as follows:

[Dont's]
[Do's]
Must: Please write down the students' names if you received any help from them. It won't affect the scores for your homework or projects. But, you will learn from this practice how to credit others for their services. It is an essential skill when you collaborate with others in the future.