In recent years, we've seen the soar of machine learning (ML)-enabled applications in our lives, such as ChatGPT [link] or autonomous driving cars [link], which calls for a comprehensive understanding of their security and privacy implications. Research in the field of Trustworthy ML (TML) studies (potential) security and privacy risks an adversary can inflict. A well-studied risk, and an example of the research outcomes, is predictions manipulated by adversarial examples [link]. It leads to developing defenses, e.g., adversarial training [link]—a training mechanism that reduces the sensitivity of models to small input perturbations. Researchers have extended this concept to language models, referring to it as "jailbreaks," which is an actively studied area [link].
In this class, students will be able to familiarize themselves with the history of attacks and defenses against ML and their extensions to emerging ML-enabled systems, such as generative ML models. The class materials will cover three fundamental threats: (1) adversarial examples, (2) data poisoning, and (3) privacy risks. Students will review prior work, from classical papers to up-to-date ones, implement basic attacks and defenses, evaluate their effectiveness, and conduct a mini-research project on a topic of their choice.
In the end, we expect:
The University's Code of Academic Integrity applies, modified as follows: