2nd Workshop on Scalable 3D Scene Generation and Geometric Scene Understanding

ICCV 2025 Workshop

Oct. 20th (2:00pm - 5:50pm), 2025


Introduction

Large-scale geometric scene understanding is one of the most critical and long-standing research directions in computer vision, with the impactful applications in autonomous driving, and robotics. Recently, there has been a surge of interest in 3D scene generation, driven by its wide-ranging applications in the gaming industry, augmented reality (AR), and virtual reality (VR). All these have been transforming our lives and enabling significant commercial opportunities. Both academia and industry have been investing heavily in pushing the research directions toward more efficiency and handling the large-scale scene.

The efficiency and quality of the large-scale reconstruction, and generation rely on the 3D representation and priors applied in solving the problem. Moreover, different industries such as robotics, autonomous driving and gaming industry have distinct requirements on the quality and efficiency of the obtained 3D scene structures. The proposed workshop will gather top researchers and engineers from both academia and industry to discuss the future key challenges for this.


Call For Papers

Call for papers: We invite papers of up to 8 pages (in ICCV25 format) for work on tasks related to 3D generation, reconstruction, geometric scene understanding. As the paper will be included the ICCV workshop proceedings, no dual submission is accepted. Paper topics may include but are not limited to:

  • Scalable large-scale 3D scene generation
  • Efficient 3D representation learning for large-scale 3D scene reconstruction
  • Learning compositional structure of the 3D Scene, 3D scalable Object-centric learning
  • 3D Reconstruction and generation for dynamic scene (with humans and/or rigid objects such as cars)
  • Online learning for scalable 3D scene reconstruction
  • Foundation models for 3D geometric scene understanding
  • 3D Reconstruction and Generation for AR/VR/Robotics etc
  • Datasets for large-scale scene reconstruction and generation with (moving objects)
  • Multi-modal 3D scene generation and geometric understanding

Submission: We encourage submissions of up to 8 pages, excluding references and acknowledgements. The submission should be in the ICCV format. Reviewing will be double-blind. Please submit your paper to the following address by the deadline: Submission Portal


Poster Presentation

#1. Forgetting-Free Incremental Panoptic Lifting by Maximum-Visibility Viewpoint Selection,
      Akira Kohjin (Shiga University/RIKEN),Motoharu Sonogashira (RIKEN), Masaaki Iiyama (Shiga University), Yasutomo Kawanishi (RIKEN)
#2. SELDOM: Scene Editing via Latent Diffusion with Object-centric Modifications,
      Richard Higgins ((University of Michigan), David Fouhey (New York University)
#3. Globally Optimal Registration of Dense Terrestrial Laser Scans from Coarse Sampling,
      Dorian KEMPF (Université de Picardie Jules Verne), Guillaume CARON (Université de Picardie Jules Verne), El Mustapha MOUADDIB (Université de Picardie Jules Verne), Fumio KANEHIRO (CNRS-AIST JRL (Joint Robotics Laboratory), IRL, National Institute of Advanced Industrial Science and Technology (AIST))
#4. Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos,
      Rundong Luo (Cornell University), Matthew Wallingford (University of Washington), Ali Farhadi (University of Washington), Noah Snavely (Cornell University), Wei-Chiu Ma (Cornell University), ICCV25 (Main Conference Paper)



Important Dates

Paper submission deadline June 30th, 2025
Notifications to accepted papers July 10th, 2025
Paper camera ready August 10th, 2025
Workshop date Oct 20th, PM, 2025


Schedule

Welcome 2:05pm - 2:10pm
Angel Xuan Chang
Title: Compositional scene generation.
2:10pm - 2:40pm
Jia Deng
Title: Data-Centric Visual Learning via Procedural Synthetic Scenes.
2:40pm - 3:10pm
Gim Hee Lee
Title: Interaction with Our Dynamic 3D World.
3:10pm - 3:40pm
Coffee Break and Poster Session 3:40pm - 4:40pm
Amir Zamir
Title: Multimodal Scene Understanding.
4:40pm - 5:10pm
Kristen Grauman
TBD
5:10pm - 5:40pm
Concluding Remarks 5:40pm - 5:45pm


Invited Speakers


Angel Xuan Chang is an Associate Professor at Simon Fraser University. Prior to this, she was a visiting research scientist at Facebook AI Research and a research scientist at Eloquent Labs working on dialogue. She received my Ph.D. in Computer Science from Stanford, where she was part of the Natural Language Processing Group and advised by Chris Manning. Her research focuses on connecting language to 3D representations of shapes and scenes and grounding of language for embodied agents in indoor environments. She has worked on methods for synthesizing 3D scenes and shapes from natural language, and various datasets for 3D scene understanding. In general, she is interested in the semantics of shapes and scenes, the representation and acquisition of common sense knowledge, and reasoning using probabilistic models.

Title: Compositional scene generation.


Jia Deng is a Professor of Computer Science at Princeton University, where he directs the Princeton Vision & Learning Lab. He received his Ph.D. from Princeton and his B.Eng. from Tsinghua University, both in computer science. His research spans 3D vision, object and action recognition, and automated reasoning, with interests in per-pixel 3D reconstruction of real-world scenes, visual understanding of objects and interactions, and connections between automated theorem proving, NLP, and AutoML. He is a recipient of the Sloan Research Fellowship, the NSF CAREER Award, the ONR Young Investigator Award, the ICCV Marr Prize, the CVPR Test-of-Time Award, and two ECCV Best Paper Awards.

Title: Data-Centric Visual Learning via Procedural Synthetic Scenes


Kristen Grauman is a Professor at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on video, visual recognition, and action for perception or embodied AI. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She served for six years as an Associate Editor-in-Chief for the TPAMI and for ten years as an Editorial Board member for IJCV. She also served as a Program Chair of CVPR2015, NeurIPS2018, and ICCV2023.


Gim Hee Lee is an Associate Professor in the Department of Computer Science at the National University of Singapore (NUS). He received his PhD in Computer Science from ETH Zurich and was previously a researcher at Mitsubishi Electric Research Laboratories (MERL), USA. He has served as Area Chair for leading conferences such as CVPR, ICCV, ECCV, ICLR, and NeurIPS, and has held organizing roles including Program Chair of 3DV 2022, Demo Chair of CVPR 2023, and General Chair of 3DV 2025. He is a recipient of the Singapore NRF Investigatorship (Class of 2024). His research focuses on 3D computer vision and robotics.


Amir Zamir is an Assistant Professor of computer science at the Swiss Federal Institute of Technology (EPFL). His research is in computer vision and machine learning. Before joining EPFL in 2020, he was with UC Berkeley, Stanford, and UCF. He has received paper awards at SIGGRAPH 2022, CVPR 2020, CVPR 2018, CVPR 2016, and the NVIDIA Pioneering Research Award 2018, PAMI Everingham Prize 2022, and ECCV/ECVA Young Researcher Award 2022. His research has been covered by press outlets, such as The New York Times or Forbes. He was the chief scientist of Aurora Solar, a Forbes AI 50 company, from 2015 to 2022 and currently serves as the chief scientist of Duranta Inc.

Title: Multimodal Scene Understanding


Organizers

Miaomiao Liu
Australian National University, Australia
Jose M. Alvarez
NVIDIA, US
Mathieu Salzmann
EPFL, Swiss Data Science Center (SDSC), Switzerland
Lingjie Liu
University of Pennsylvania, US
Hongdong Li
Australian National University, Australia
Richard Hartley
Australian National University, & Google, Australia



Contact

To contact the organizers please use S3DSGR@gmail.com



Acknowledgments

Thanks to visualdialog.org for the webpage format.

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.