Xiaofeng Wang (Jeff)
I am currently in my fifth year as a Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA). Prior to that, I received my Bachelor's degree from the department of Automation, Nanjing University of Science and Technology (NJUST) in 2020. Additionally, I have spent some time at University of Dayton (UD), Megvii, PhiGent, GigaAI, and Alibaba TongYi.
My research interests revolve around AIGC (video generation), world models and 3D perceptions, aiming to develop understanding of physics and motion in AI systems. Please feel free to reach out if you have any questions or would like to discuss further.
Email  / 
Google Scholar  / 
Github
|
|
News
2024-07: DriveDreamer is selected as one of the Most Influential ECCV'24 Papers.
2024-07: One paper on driving video generation is accepted to ECCV 2024.
2024-05: One paper on occupancy prediction is accepted to IJCAI 2024.
2023-11: Our ICLR'24 technique report exploring GPT-4V on autonomous driving is available. Exciting to see the community sharing thoughts on our latest findings!
2023-07: One paper on 3D occupancy prediction is accepted to ICCV 2023.
2023-02: One paper on 3D streaming perception is accepted to CVPR 2023.
2023-01: One paper on 3D pretraining is accepted to ICLR 2023.
2022-11: One paper on self-supervised depth estimation is accepted to AAAI 2023.
2022-07: One paper on multi-view depth estimation is accepted to ECCV 2022.
|
Research
* indicates equal contribution
|
|
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
Xiaofeng Wang, Kang Zhao, Feng Liu, Jiayu Wang, Guosheng Zhao, Xiaoyi Bao, Zheng Zhu, Yingya Zhang, Xingang Wang
arXiv, 2024
[arXiv] [Page] [Data] [Code]
EgoVid-5M is the first curated high-quality action-video dataset designed specifically for egocentric video generation.
|
|
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao*, Chaojun Ni*, Xiaofeng Wang*, Zheng Zhu*, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, Wenjun Mei, Xingang Wang
arXiv, 2024
[arXiv] [Page] [Code]
DriveDreamer4D is the first to utilize video generation models for improving 4D reconstruction in driving scenarios.
|
|
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu*,
Xiaofeng Wang*,
Wangbo Zhao*,
Chen Min*,
Nianchen Deng*,
Min Dou*,
Yuqi Wang*,
Botian Shi,
Kai Wang,
Chi Zhang,
Yang You,
Zhaoxiang Zhang,
Dawei Zhao,
Liang Xiao,
Jian Zhao,
Jiwen Lu,
Guan Huang
arXiv, 2024
[arXiv] [Code]
A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.
|
|
DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Guosheng Zhao*, Xiaofeng Wang*, Zheng Zhu*, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang
arXiv, 2024
[arXiv] [Page] [Code]
DriveDreamer-2 is the first world model to generate customized driving videos in a user-friendly manner.
|
|
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Xiaofeng Wang*, Zheng Zhu*, Guan Huang*, Boyuan Wang, Xinze Chen, Jiwen Lu
arXiv, 2024
[arXiv] [Page]
WorldDreamer, a pioneering world model to foster a comprehensive comprehension of general world physics and motions, which significantly enhances the capabilities of video generation.
|
|
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Licheng Wen*, Xuemeng Yang*, Daocheng Fu*, Xiaofeng Wang*, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao
ICLR Workshop on LLM Agents, 2024
[arXiv] [Page]
This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios.
|
|
Drivedreamer: Towards real-world-driven world models for autonomous driving
Xiaofeng Wang*, Zheng Zhu*, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu
European Conference on Computer Vision (ECCV), 2024
[arXiv] [page] [Code]
DriveDreamer is the first world model established from real-world driving scenarios. It empowers controllable driving video generation and enables the prediction of reasonable driving policies.
|
|
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
Xiaofeng Wang*, Zheng Zhu*, Wenbo Xu*, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu , Xingang Wang
IEEE International Conference on Computer Vision (ICCV), 2023
[arXiv] [Code]
Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.
|
|
StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion
Bohan Li, Yasheng Sun, Xin Jin, Wenjun Zeng, Zheng Zhu, Xiaofeng Wang, Yunpeng Zhang, James Okae, Hang Xiao, Dalong Du
International Joint Conferences on Artificial Intelligence (IJCAI), 2024
[arXiv] [Code]
We propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors.
|
|
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
Xiaofeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen , Xingang Wang
IEEE Conference on Computer Vision and Pattern Recogintion (CVPR), 2023
[arXiv] [Code]
We propose the Autonomousdriving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.
|
|
LiftedCL: Lifting Contrastive Learning for Human-Centric Perception
Ziwei Chen , Qiang Li , Xiaofeng Wang, Wankou Yang
International Conference on Learning Representations (ICLR), 2023
[paper] [page] [Code]
We propose the Lifting Contrastive Learning (LiftedCL) to obtain 3D-aware human-centric representations which absorb 3D human structure information.
|
|
MOVEDepth: Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning
Xiaofeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen , Xingang Wang
AAAI Conference on Artificial Intelligence (AAAI), 2023
[arXiv] [Code]
MOVEDepth is a self-supervised depth estimation method that explores monocular cues to enhance the multi-frame depth learning.
|
|
MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
Xiaofeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He , Xingang Wang
European Conference on Computer Vision (ECCV), 2022
[arXiv] [Code]
We propose a novel end-to-end Transformer-based method for multi-view stereo, named MVSTER. It leverages the proposed epipolar Transformer to efficiently learn 3D associations along epipolar line.
|
Academic Services
Conference Reviewer / Program Committee Member: CVPR, ECCV, AAAI, ICLR, TRBAM
Journal Reviewer: T-MM, T-CSVT
|
Honors and Awards
2019 Ruihua Cup Annual Outstanding College Student / 瑞华杯大学生年度人物
2019 President's Medal of NJUST / 校长奖章
2019 National Scholarship / 国家奖学金
2018 CSC Scholarship / 国家公派留学奖学金
2017 National Scholarship / 国家奖学金
|
© Xiaofeng Wang | Last updated: Nov 04, 2024
|