Machine Learning and Friends Lunch: Yixin Wang, Causal Inference with Unstructured Data
Content
Speaker
Abstract
Causal inference traditionally involves analyzing tabular data where variables like treatment, outcome, covariates, and colliders are manually labeled by humans. However, many complex causal inference problems rely on unstructured data sources such as images, text and videos that depict overall situations. These causal problems require a crucial first step - extracting the high-level latent causal factors from the low-level unstructured data inputs, a task known as "causal representation learning."
In this talk, we explore how to identify latent causal factors from unstructured data, whether from passive observations, interventional experiments, or multi-domain datasets. While latent factors are classically uncovered by leveraging their statistical independence, causal representation learning grapples with a thornier challenge: the latent causal factors are often correlated, causally connected, or arbitrarily dependent.
Our key observation is that, despite correlations, the causal connections (or the lack of) among factors leave geometric signatures in the latent factors' support - the ranges of values each can take. Leveraging these signatures, we show that observational data alone can identify the latent factors up to coordinate transformations if they bear no causal links. When causal connections do exist, interventional data can provide geometric clues sufficient for identification. In the most general case of arbitrary dependencies, multi-domain data can separate stable factors from unstable ones. Taken together, these results showcase the unique power of geometric signatures in causal representation learning.
This is joint work with Kartik Ahuja, Yoshua Bengio, Michael Jordan, Divyat Mahajan, and Amin Mansouri.
Bio
Yixin Wang is an assistant professor of statistics at the University of Michigan. She works in the fields of Bayesian statistics, machine learning, and causal inference. Previously, she was a postdoctoral researcher with Professor Michael Jordan at the University of California, Berkeley. She completed her PhD in statistics at Columbia, advised by Professor David Blei, and her undergraduate studies in mathematics and computer science at the Hong Kong University of Science and Technology. Her research has been recognized by the j-ISBA Blackwell-Rosenbluth Award, ICSA Conference Young Researcher Award, ISBA Savage Award Honorable Mention, ACIC Tom Ten Have Award Honorable Mention, and INFORMS data mining and COPA best paper awards.