Deep Visual Reasoning with Optimization-based Network Modules

Abstract

Deep learning approaches have achieved astonishing performance in numerous vision applications, including image classification, object detection, and semantic segmentation. While these problems are easily treated with standard feed-forward architectures, many computer vision problems require more complex reasoning about the information given during inference. In particular, more sophisticated autonomous agents need to be able to learn new concepts and abilities “on the fly”, given only limited data and supervision. However, developing effective end-to-end learnable methods for few-shot and online learning tasks have turned out to be a formidable challenge.

We tackle this challenge by designing deep network modules that internally optimize an objective. Since key problems in many computer vision tasks can be formulated as objective functions, optimization-based network modules are able to perform effective and efficient reasoning in such circumstances. By further learning the objective function itself, we obtain a general family of deep network modules, capable of more complex non-local reasoning. We will cover their application within a variety of tasks, including visual tracking, video object segmentation, few-shot segmentation, dense correspondence estimation, and multi-frame image restoration.

About the speaker

Martin Danelljan Martin Danelljan is a senior researcher at ETH Zürich, Switzerland. He received his Ph.D. degree from Linköping University, Sweden in 2018. His Ph.D. thesis was awarded the biennial Best Nordic Thesis Prize at SCIA 2019. His main research interests are meta and online learning, deep probabilistic models, and conditional generative models. His research includes applications to visual tracking, video object segmentation, dense correspondence estimation, and super-resolution. His research in the field of visual tracking, in particular, has attracted much attention, achieving first rank in the 2014, 2016, and 2017 editions of the Visual Object Tracking (VOT) Challenge and the OpenCV State-of-the-Art Vision Challenge. He received the best paper award at ICPR 2016, the best student paper at BMVC 2019, and an outstanding reviewer award at ECCV 2020. He serves as a senior PC member for AAAI 2022 and an area chair for CVPR 2022. He is also a co-organizer of the VOT, NTIRE, and AIM workshops.