1st Workshop on Scalable 3D Scene Generation and Geometric Scene Understanding

ECCV 2024 Workshop


Large-scale geometric scene understanding is one of the most critical and long-standing research directions in computer vision, with the impactful applications in autonomous driving, and robotics. Recently, there has been a surge of interest in 3D scene generation, driven by its wide-ranging applications in the gaming industry, augmented reality (AR), and virtual reality (VR). All these have been transforming our lives and enabling significant commercial opportunities. Both academia and industry have been investing heavily in pushing the research directions toward more efficiency and handling the large-scale scene.

The efficiency and quality of the large-scale reconstruction, and generation rely on the 3D representation and priors applied in solving the problem. Moreover, different industries such as robotics, autonomous driving and gaming industry have distinct requirements on the quality and efficiency of the obtained 3D scene structures. The proposed workshop will gather top researchers and engineers from both academia and industry to discuss the future key challenges for this.

Call For Papers

Call for papers: We invite non-archival papers of up to 14 pages (in ECCV format) for work on tasks related to 3D generation, reconstruction, geometric scene understanding. Paper topics may include but are not limited to:

  • Scalable large-scale 3D scene generation
  • Efficient 3D representation learning for large-scale 3D scene reconstruction
  • Learning compositional structure of the 3D Scene, 3D scalable Object-centric learning
  • 3D Reconstruction and generation for dynamic scene (with humans and/or rigid objects such as cars)
  • Online learning for scalable 3D scene reconstruction
  • Foundation models for 3D geometric scene understanding
  • 3D Reconstruction and Generation for AR/VR/Robotics etc
  • Datasets for large-scale scene reconstruction and generation with (moving objects)

Submission: We encourage submissions of up to 14 pages, excluding references and acknowledgements. The submission should be in the ECCV format. Reviewing will be double-blind. Please submit your paper to the following address by the deadline: Submission Portal

Important Dates

Paper submission deadline July 7th, 2024
Notifications to accepted papers July 28th, 2024
Paper camera ready August 1st, 2024
Workshop date September 29th, AM, 2024


Welcome 8:55am - 9:00am
Invited Talk 9:00am - 9:30am
Invited Talk 9:30am - 10:00am
Invited Talk 10:00am - 10:30am
Coffee Break and Poster Session 10:30am - 11:30am
Invited Talk 11:30am - 12:00am
Invited Talk 12:00pm - 12:30pm
Concluding Remarks 12:30pm - 12:40pm

Invited Speakers

Marc Pollefeys is a Professor of Computer Science at ETH Zurich and the Director of the Microsoft Mixed Reality and AI Lab in Zurich where he works with a team of scien- tists and engineers to develop advanced perception capabilities for HoloLens and Mixed Reality. He is best known for his work in 3D computer vision, having been the first to develop a software pipeline to automatically turn photographs into 3D models, but also works on robotics, graphics and machine learning problems. Other noteworthy projects he worked on are real-time 3D scanning with mobile devices, a real-time pipeline for 3D reconstruction of cities from vehicle mounted-cameras, camera-based self-driving cars and the first fully autonomous vision-based drone. Most recently his academic research has focused on combining 3D reconstruction with semantic scene understanding.

Sanja Fidler is an Associate Professor at University of Toronto, affiliated faculty at the Vector Institute and VP of AI Research at NVIDIA, leading a research lab in Toronto. Prior to that, in 2012/2013, Sanja was a Research Assistant Professor at Toyota Technological Institute at Chicago. Sanja’s work is in the area of Computer Vision and Machine Learning, specifically the intersection of computer vision and graphics, 3D vision, 3D reconstruction and synthesis; and interactive methods for image annotation.

Lingjie Liu is the Aravind K. Joshi Assistant Professor in the Department of Computer and Information Science at the University of Pennsylvania, where she leads the Penn Computer Graphics Lab. and she is also a member of the General Robotics, Automation, Sensing \& Perception (GRASP) Lab. Previously, she was a Lise Meitner Postdoctoral Research Fellow at Max Planck Institute for Informatics. She received her Ph.D. degree at the University of Hong Kong in 2019. Her research interests are at the interface of Computer Graphics, Computer Vision, and AI, with a focus on Neural Scene Representations, Neural Rendering, Human Performance Modeling and Capture, and 3D Reconstruction.

Vincent Lepetit a professor at ENPC ParisTech, France. Before that, he was a full professor at the Institute for Computer Graphics and Vision, TU Graz, Austria and before that, a senior researcher at CVLab, EPFL, Switzerland. His research focuses on 3D scene understanding. More exactly, he aims at reducing as much as possible the guidance a system needs to learn new 3D objects and new 3D environments: How can we remove the need for training data for each new 3D problem? Currently, even self-supervised methods often require CAD models, which are not necessarily available for any type of object. This question has both theoretical implications and practical applications, as the need for training data, even synthetic, is often a deal breaker for non-academic problems. Vincent received the Koenderick “test-of-time” award at the European Conference on Computer Vision 2020 for “Brief: Binary Robust Independent Elementary Features”. He regularly serves as an area chair of the major computer vision conferences: CVPR, ICCV, ECCV, ACCV, BMVC and as an editor for PAMI and IJCV.


Miaomiao Liu
Australian National University, Australia
Jose M. Alvarez
Mathieu Salzmann
EPFL, Swiss Data Science Center (SDSC), Switzerland
Buyu Liu
Zhejiang University, China
Hongdong Li
Australian National University, Australia
Richard Hartley
Australian National University, & Google, Australia


To contact the organizers please use S3DSGR@gmail.com


Thanks to visualdialog.org for the webpage format.