3rd Workshop on Scalable 3D Scene Generation and Geometric Scene Understanding

ECCV 2026 Workshop

xx,TBD, Sep. xx, 2026


Introduction

Large-scale geometric scene understanding is one of the most critical and long-standing research directions in computer vision, with the impactful applications in autonomous driving, and robotics. Recently, there has been a surge of interest in 3D scene generation, driven by its wide-ranging applications in the gaming industry, augmented reality (AR), and virtual reality (VR). All these have been transforming our lives and enabling significant commercial opportunities. Both academia and industry have been investing heavily in pushing the research directions toward more efficiency and handling the large-scale scene.

The efficiency and quality of the large-scale reconstruction, and generation rely on the 3D representation and priors applied in solving the problem. Moreover, different industries such as robotics, autonomous driving and gaming industry have distinct requirements on the quality and efficiency of the obtained 3D scene structures. The proposed workshop will gather top researchers and engineers from both academia and industry to discuss the future key challenges for this.


Call For Papers

Call for papers: We invite papers of up to 8 pages (in ECCV26 format) for work on tasks related to 3D generation, reconstruction, geometric scene understanding. As the paper will be included the ICCV workshop proceedings, no dual submission is accepted. Paper topics may include but are not limited to:

  • Scalable large-scale 3D scene generation
  • Efficient 3D representation learning for large-scale 3D scene reconstruction
  • Learning compositional structure of the 3D Scene, 3D scalable Object-centric learning
  • 3D Reconstruction and generation for dynamic scene (with humans and/or rigid objects such as cars)
  • Online learning for scalable 3D scene reconstruction
  • Foundation models for 3D geometric scene understanding
  • 3D Reconstruction and Generation for AR/VR/Robotics etc
  • Datasets for large-scale scene reconstruction and generation with (moving objects)
  • Multi-modal 3D scene generation and geometric understanding

Submission: We encourage submissions of up to xx pages, excluding references and acknowledgements. The submission should be in the ECCV format. Reviewing will be double-blind. Please submit your paper to the following address by the deadline: Submission Portal



Important Dates

Paper submission deadline July 1st, 2026
Notifications to accepted papers July 18th, 2026
Paper camera ready August 10th, 2026
Workshop date xx xx 2026


Schedule

Welcome 2:05pm - 2:10pm
TBD
TBD.
2:10pm - 2:40pm
TBD
TBD.
2:40pm - 3:10pm
TBD
TBD
3:10pm - 3:40pm
Coffee Break and Poster Session 3:40pm - 4:40pm
TBD
TBD
4:40pm - 5:10pm
TBD
TBD
5:10pm - 5:40pm
Concluding Remarks 5:40pm - 5:45pm


Invited Speakers


Angela Dai is an Associate Professor at the Technical University of Munich where she leads the 3D AI Lab. Angela's research aims to enable machines to understand, model, and generate real-world 3D environments. She focuses on enabling the creation of rich, semantically grounded, and interactable 3D worlds that allow machines not only to perceive physical spaces, but to reason about them and act within them. She received her PhD in computer science from Stanford in 2018, advised by Pat Hanrahan, and her BSE in computer science from Princeton in 2013. Her research has been recognized through an ECVA Young Researcher Award, ERC Starting Grant, Eurographics Young Researcher Award, German Pattern Recognition Award, Google Research Scholar Award, and an ACM SIGGRAPH Outstanding Doctoral Dissertation Honorable Mention. She has also served as Program Chair for Eurographics 2025 and CVPR 2026.

TBD.


Andreas Geiger is a Professor and is heading the Autonomous Vision Group (AVG) at the University of Tübingen located in Tübingen, Germany at the heart of CyberValley. He is the head of the Department of Computer Science, a core faculty member of the Tübingen AI Center, PI in the cluster of excellence ML in Science and the CRC Robust Vision. He is also an ELLIS fellow and coordinator of the ELLIS PhD program. His research group is developing machine learning models for computer vision, natural language and robotics with applications in self-driving, VR/AR and scientific document analysis.

TBD


Richard Zhang is a Professor in the School of Computing Science at SFU, and also Vice President of AI and R&D at Augmenta. At Augmenta, he leads the company's effort in developing the most advanced AI models and tools for efficient and sustainable building designs. More broadly, they are exploring spatial and functional intelligence to tackle challenges in the physical world and contributing to the R&D ecosystem for physical AI. He obtained his Ph.D. from the Dynamic Graphics Project (DGP) at the University of Toronto, and MMath and BMath degrees from the University of Waterloo. He directs the GrUVi (Graphics U Vision) Lab, one of the top places in the world to conduct computer graphics and computer vision research. His research is in computer graphics and more broadly, visual computing, with special interests in geometric and generative modeling, shape analysis, 3D vision, spatial AI, geometric deep learning, as well as computational design and fabrication. He has published more than 200 papers on these topics, including 75+ articles in SIGGRAPH (+Asia) and ACM Trans. on Graphics (TOG), the top venue in computer graphics, and he has an Erdös number of 3. His research has been sponsored by Adobe, Autodesk, Boeing, Glodon, Google, and NSERC. He is an IEEE Fellow (see SFU news coverage), hold a Distinguished University Professorship, and is a member of the ACM SIGGRAPH Academy. He was an Amazon Scholar from late 2021 to early 2025.


Gim Hee Lee is an Associate Professor in the Department of Computer Science at the National University of Singapore (NUS). He received his PhD in Computer Science from ETH Zurich and was previously a researcher at Mitsubishi Electric Research Laboratories (MERL), USA. He has served as Area Chair for leading conferences such as CVPR, ICCV, ECCV, ICLR, and NeurIPS, and has held organizing roles including Program Chair of 3DV 2022, Demo Chair of CVPR 2023, and General Chair of 3DV 2025. He is a recipient of the Singapore NRF Investigatorship (Class of 2024). His research focuses on 3D computer vision and robotics.


Amir Zamir is an Assistant Professor of computer science at the Swiss Federal Institute of Technology (EPFL). His research is in computer vision and machine learning. Before joining EPFL in 2020, he was with UC Berkeley, Stanford, and UCF. He has received paper awards at SIGGRAPH 2022, CVPR 2020, CVPR 2018, CVPR 2016, and the NVIDIA Pioneering Research Award 2018, PAMI Everingham Prize 2022, and ECCV/ECVA Young Researcher Award 2022. His research has been covered by press outlets, such as The New York Times or Forbes. He was the chief scientist of Aurora Solar, a Forbes AI 50 company, from 2015 to 2022 and currently serves as the chief scientist of Duranta Inc.

Title: Multimodal Scene Understanding


Organizers

Miaomiao Liu
Australian National University, Australia
Jose M. Alvarez
NVIDIA, US
Mathieu Salzmann
EPFL, Swiss Data Science Center (SDSC), Switzerland
Lingjie Liu
University of Pennsylvania, US
Hongdong Li
Australian National University, Australia
Richard Hartley
Australian National University, & Google, Australia



Contact

To contact the organizers please use S3DSGR@gmail.com



Acknowledgments

Thanks to visualdialog.org for the webpage format.

The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.