Content

Speaker

Pratheba Selvaraju

Abstract

3D reconstruction from real-world data is essential in applications like augmented reality, robotics, medical imaging, and autonomous navigation. However, this data is often noisy, incomplete, occluded, or corrupted. Despite these imperfections, utilizing this data is necessary to develop reconstruction methods that can be applied in real-world and real-time scenarios. Additionally, each application has its own requirements and constraints, and achieving the best possible outcome depends on selecting suitable representations catered to each case. Given the wide range of applications, grouping them based on specific characteristics—such as static or dynamic objects allows for a targeted approach that can be applied to similar scenarios. To this end, this thesis addresses reconstruction tasks for static and dynamic structures, focusing on buildings and human faces exploring representations best suited to each.

We begin by focusing on static structure reconstruction for urban planning and development, which primarily deals with non-malleable material constraints. To address this, we introduce Developability Approximation for Neural Implicits through Rank Minimization, a neural network model that represents surfaces as piecewise developable patches. The model encodes data implicitly, offering an advantage over prior explicit methods that struggle with high tessellation and shape fidelity.

To extend this method to urban planning, we created a large-scale dataset of 2,000 diverse building exterior (e.g., residential, commercial, stadium) named BuildingNet: Learning to Label 3D Buildings. This dataset is used to simulate and apply the method for evaluating designs, costs, and feasibility for planning.

Next, we explore dynamic object reconstruction, focusing on human faces with real-world applications in forensic science, medical imaging, animation, and telepresence, by introducing OFER: Occluded Face Expression Reconstruction. OFER reconstructs expressive human faces from occluded images. It employs a face parametric model that encodes facial features, enabling smooth reconstruction and easy animatability by adjusting the model parameters. This  is achieved by training UNet-based diffusion models to generate varied expression parameters for the occluded regions.

In facial animation, real-time performance is crucial for applications like gaming and augmented reality, which require computational efficiency without compromising quality. Traditional UNet-based diffusion models often suffer from slower inference times. To tackle this we explore efficient computational representations and introduce FORA: Fast-Forward Caching for Diffusion Transformer Acceleration. FORA employs a caching mechanism that reuses intermediate outputs, thereby minimizing computational overhead without requiring model retraining, enabling faster processing with minimal trade-offs in quality.

Advisor

Erik Learned-Miller

Hybrid event posted in PhD Thesis Defense