Faculty Recruiting Support CICS

Improving Variational Inference Through Advanced Stochastic Optimization Techniques

08 Aug
Add to Calendar
Thursday, 08/08/2024 10:00am to 12:00pm
Computer Science Building, Room 303
PhD Thesis Defense
Speaker: Javier Burroni

Black-box variational inference (VI) is crucial in probabilistic machine learning, offering an alternative method for Bayesian inference. By requiring only black-box access to the model and its gradients, it recasts complex inference tasks into more manageable optimization problems, aiding in the approximation of intricate posterior distributions across a wide range of models. However, black-box VI faces a fundamental challenge: managing the noise introduced by using stochastic gradient optimization methods, which limits efficient approximations. This thesis presents new approaches to enhance the efficiency of black-box VI by improving different aspects of its optimization process.

The first part of this thesis focuses on the importance-weighted evidence lower bound (IW-ELBO), an objective used in the VI optimization problem. The IW-ELBO, by incorporating importance sampling, augments the expressive power of the approximating distributions used in VI. However, it also introduces increased variance in gradient estimation, complicating the optimization process. To mitigate this, our thesis applies the theory of U-statistics, an approach that significantly reduces variance. Since fully implementing U-statistics can be impractical due to exponential growth in computation, we introduce approximate methods that effectively reduce variance with minimal computational overhead.

The second part of this thesis addresses a central issue within black-box VI: its stochastic optimization process, i.e., Stochastic Gradient Descent or its variations, is highly sensitive to user-specified hyperparameter choices, often leading to poor results. We address this issue by introducing an algorithm specifically designed for VI, based on the sample average approximation (SAA). This method, SAA for VI, transforms the stochastic optimization task into a sequence of deterministic problems that can be easily solved using standard optimization techniques. As a result, it simplifies and automates the optimization process, reduces the burden of hyperparameter tuning, and exhibits robust performance, particularly in complex statistical models involving hundreds of latent variables.

In the third part of this thesis, we shift our focus from the objective and optimization process to the approximating distributions used in VI and their gradient estimation. Specifically, we explore how to use reparameterization---a key technique in black-box VI---for mixture distributions.
Due to the discrete nature of choices involved in sampling from mixture distributions, the standard reparameterization trick is not directly applicable. Although prior work has proposed several gradient estimators that use some form of reparameterization, there remains a noticeable lack of clarity regarding which estimators are available, in which contexts they are applicable, and how they compare.
To address this gap, we introduce and evaluate the most relevant gradient estimators for mixture distributions using a consistent mathematical framework and, through this framework, we extend existing estimators to new settings. We then give a comprehensive performance comparison of different estimators---theoretically, where we can sometimes compare variance, and empirically, where we assess the estimators across different setups. Finally, we address the often overlooked computational aspect by introducing novel, efficient algorithms for some of the estimators.

This thesis contributes to both the theoretical understanding and practical implementation of VI. By introducing new methods and techniques, we aim to enhance the accuracy and efficiency of VI and broaden its applicability.