Content

Speaker

Alex Wong

Abstract

Training deep neural networks requires tens of thousands to millions of  examples, so curating multimodal vision datasets amounts to numerous man-hours; tasks like depth estimation require an even more massive effort. I will introduce an alternative form of supervision that leverages multi-sensor validation as an unsupervised (or self-supervised) training objective for depth estimation. To address its ill-posedness, I will show how one can leverage multimodal inputs in the choice of regularizers, which can play a role in model complexity, speed, generalization, as well as adaptation to test-time (possibly adverse) environments. Additionally, I will discuss the current limitations of data augmentation procedures used during unsupervised training, which involves reconstructing the inputs as the supervision signal, and detail a method that allows one to scale up and introduce previously inviable augmentations to boost performance. Finally, I will show how one can scalably expand the number of modalities supported by multimodal models and demonstrate their use in a number of downstream semantic tasks.

Bio

Alex Wong is an Assistant Professor in the department of Computer Science and the director of the Vision Laboratory at Yale University. He also serves as the Director of AI (consulting capacity) for Horizon Surgical Systems. Prior to joining Yale, he was an Adjunct Professor at Loyola Marymount University (LMU) from 2018 to 2020. He received his Ph.D. in Computer Science from the University of California, Los Angeles (UCLA) in 2019 and was co-advised by Stefano Soatto and Alan Yuille. He was previously a post-doctoral research scholar at UCLA under the guidance of Soatto. His research lies in the intersection of machine learning, computer vision, and robotics and largely focuses on multimodal 3D reconstruction, robust vision under adverse conditions, and unsupervised learning. His work has received the outstanding student paper award at the Conference on Neural Information Processing Systems (NeurIPS) 2011 and the best paper award in robot vision at the International Conference on Robotics and Automation (ICRA) 2019.