2nd Workshop on Scalable 3D Scene Generation and Geometric Scene Understanding

ICCV 2025 Workshop

Oct. 20th (2:00pm - 5:50pm), 2025

Introduction

Large-scale geometric scene understanding is one of the most critical and long-standing research directions in computer vision, with the impactful applications in autonomous driving, and robotics. Recently, there has been a surge of interest in 3D scene generation, driven by its wide-ranging applications in the gaming industry, augmented reality (AR), and virtual reality (VR). All these have been transforming our lives and enabling significant commercial opportunities. Both academia and industry have been investing heavily in pushing the research directions toward more efficiency and handling the large-scale scene.

The efficiency and quality of the large-scale reconstruction, and generation rely on the 3D representation and priors applied in solving the problem. Moreover, different industries such as robotics, autonomous driving and gaming industry have distinct requirements on the quality and efficiency of the obtained 3D scene structures. The proposed workshop will gather top researchers and engineers from both academia and industry to discuss the future key challenges for this.

Call For Papers

Call for papers: We invite papers of up to 8 pages (in ICCV25 format) for work on tasks related to 3D generation, reconstruction, geometric scene understanding. As the paper will be included the ICCV workshop proceedings, no dual submission is accepted. Paper topics may include but are not limited to:

Scalable large-scale 3D scene generation
Efficient 3D representation learning for large-scale 3D scene reconstruction
Learning compositional structure of the 3D Scene, 3D scalable Object-centric learning
3D Reconstruction and generation for dynamic scene (with humans and/or rigid objects such as cars)
Online learning for scalable 3D scene reconstruction
Foundation models for 3D geometric scene understanding
3D Reconstruction and Generation for AR/VR/Robotics etc
Datasets for large-scale scene reconstruction and generation with (moving objects)
Multi-modal 3D scene generation and geometric understanding

Submission: We encourage submissions of up to 8 pages, excluding references and acknowledgements. The submission should be in the ICCV format. Reviewing will be double-blind. Please submit your paper to the following address by the deadline: Submission Portal

Important Dates

Paper submission deadline	June 30th, 2025
Notifications to accepted papers	July 10th, 2025
Paper camera ready	August 10th, 2025
Workshop date	Oct 20th, PM, 2025

Schedule

Welcome	2:05pm - 2:10pm
Speaker 1 TBD	2:10pm - 2:40pm
Speaker 2 TBD	2:40pm - 3:10pm
Speaker 3 TBD	3:10pm - 3:40pm
Coffee Break and Poster Session	3:40pm - 4:40pm
Speaker 4 TBD	4:40pm - 5:10pm
Speaker 5 TBD	5:10pm - 5:40pm
Concluding Remarks	5:40pm - 5:45pm

Invited Speakers

Kristen Grauman is a Professor at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on video, visual recognition, and action for perception or embodied AI. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She served for six years as an Associate Editor-in-Chief for the TPAMI and for ten years as an Editorial Board member for IJCV. She also served as a Program Chair of CVPR2015, NeurIPS2018, and ICCV2023.

Angel Xuan Chang is an Associate Professor at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs working on dialogue. She received my Ph.D. in Computer Science from Stanford, where she was part of the Natural Language Processing Group and advised by Chris Manning. Her research focuses on connecting language to 3D representations of shapes and scenes and grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and various datasets for 3D scene understanding. In general, she is interested in the semantics of shapes and scenes, the representation and acquisition of common sense knowledge, and reasoning using probabilistic models.

Vincent Sitzmann is an Assistant Professor at MIT EECS, where he is leading the Scene Representation Group.Previously, he did my Ph.D. at Stanford University as well as a Postdoc at MIT CSAIL. His research interest lies in building AI that perceives and models the world the way that humans do. Specifically, he works towards models that can learn to reconstruct a rich state description of their environment, such as reconstructing its 3D structure, materials, semantics, etc. from vision. These models should also be able to model the impact of their own actions on that environment, i.e., learn a "mental simulator" or "world model". He is also particularly interested in models that can learn these skills fully self-supervised only from video and by self-directed interaction with the world.

Amir Zamir is Is a tenure-track Assistant Professor of Computer Science at the Swiss Federal Institute of Technology Lausanne (EPFL). Before EPFL, he was at UC Berkeley, Stanford, and UCF where he had the opportunity of working with Silvio Savarese, Jitendra Malik, Mubarak Shah, Rahul Sukthankar, and Leonidas Guibas. His research interests are broadly in computer vision, machine learning, and AI. The goal of his research has been going beyond narrow and passive vision methods toward a general multi-modal/multi-task vision and models that can operates as part of an active agent in the real world. Here are some of his works on this topic: 4M, Taskonomy, Mid-Level Vision, Gibson, Visual Morphology, MultiMAE, and Omnidata. He was the CVML chief scientist of Aurora Solar (a Forbes AI 50 company, valued at $4B in 2022) from 2015 to 2022.

Organizers

Miaomiao Liu

Australian National University, Australia

Jose M. Alvarez

NVIDIA, US

Mathieu Salzmann

EPFL, Swiss Data Science Center (SDSC), Switzerland

Lingjie Liu

University of Pennsylvania, US

Hongdong Li

Australian National University, Australia

Richard Hartley

Australian National University, & Google, Australia

Contact

To contact the organizers please use S3DSGR@gmail.com

Acknowledgments

Thanks to visualdialog.org for the webpage format.

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.