Machine Learning and Friends Lunch: Claudia Shi, Novel Problems, Classic Solutions: Understanding LLMs Through the Lens of Statistics
Content
Speaker
Abstract
In this talk, I will present two recent projects that use statistical methods to deepen our understanding of LLMs. First, I investigate the moral beliefs encoded within LLMs, focusing on the advice they offer in morally ambiguous scenarios. To quantify this, I developed statistical measures and metrics that define what it means for a model to make a choice and evaluate the uncertainty associated with that choice. By conducting a large-scale survey with 26 LLMs, we analyzed their responses to various morally ambiguous situations. Our findings reveal that frontier LLMs often provide similar responses, even when they differ from those of human annotators. Second, I explore how LLMs implement tasks by examining the circuit hypothesis—the idea that specific tasks are executed by subnetworks within the model, known as circuits. I developed a suite of hypothesis tests based on criteria such as mechanism preservation, localization, and minimality to evaluate these circuits. Applying these tests to several circuits identified in existing research, I assess the extent to which these circuits align with the idealized concept proposed by the hypothesis.
Bio
Claudia is a final-year Ph.D. student in Computer Science at Columbia University, advised by David Blei. Her research advances the scientific understanding of LLMs and their responsible deployment. Her work has been recognized as a spotlight paper at the Conference on Neural Information Processing Systems (NeurIPS) 2023. She is a recipient of the Columbia Center of AI Technology Ph.D. Fellowship in 2024.