Abstract: Text-to-image diffusion models are quickly becoming a powerful tool for image creation. Employing those models for intuitive editing control over images is only natural, yet challenging.
In this Talk, Hertz will present two distinct methods we have developed for text-guided image editing. In the first work, Prompt-to-Prompt, we employ the cross-attention layers of the diffusion model to refine the generation process through refinement in the condition text prompt. Furthermore, we introduce an efficient "Null Text Inversion" technique that enables prompt-to-prompt image editing over real images.
In our recent work, Delta Denoising Score, we introduce a score function for image editing that can be used directly over an image or as a loss function to train an Image2Image translation model. In this work, we analyze the noisy dynamics of the Score Distillation Sampling (SDS) when used for image editing. We suggest adding a reference SDS branch to eliminate the noisy component during the optimization.
Bio: Amir Hertz is a research scientist at Google, working on extending image editing capabilities using generative models. He completed his Ph.D. studies recently (under review) in the Department of Computer Science at Tel-Aviv University under the supervision of Prof. Daniel Cohen-Or and Prof. Raja Giryes. His research focuses on adopting and extending machine learning practices within computer graphics. Specifically, Hertz has developed models and methods for 3D shape generation, texture synthesis, 3D modeling, and meshing.