Xiaofeng Wang (Jeff)

I am currently in my fourth year as a Ph.D. student at the Institute of Automation, Chinese Academy of Sciences (CASIA). Prior to that, I received my Bachelor's degree from the department of Automation, Nanjing University of Science and Technology (NJUST) in 2020. Additionally, I have spent some time at University of Dayton (UD), Megvii, PhiGent, GigaAI, and Alibaba TongYi.

My research interests revolve around AIGC (video generation), world models and 3D perceptions, aiming to develop understanding of physics and motion in AI systems. Please feel free to reach out if you have any questions or would like to discuss further.

Email  /  Google Scholar  /  Github

profile photo
News

  • 2024-05: One paper on occupancy prediction is accepted to IJCAI 2024.
  • 2023-11: Our ICLR'24 technique report exploring GPT-4V on autonomous driving is available. Exciting to see the community sharing thoughts on our latest findings!
  • 2023-07: One paper on 3D occupancy prediction is accepted to ICCV 2023.
  • 2023-02: One paper on 3D streaming perception is accepted to CVPR 2023.
  • 2023-01: One paper on 3D pretraining is accepted to ICLR 2023.
  • 2022-11: One paper on self-supervised depth estimation is accepted to AAAI 2023.
  • 2022-07: One paper on multi-view depth estimation is accepted to ECCV 2022.
  • Research

    * indicates equal contribution

    dise Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
    Zheng Zhu*, Xiaofeng Wang*, Wangbo Zhao*, Chen Min*, Nianchen Deng*, Min Dou*, Yuqi Wang*, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang
    arXiv, 2024
    [arXiv] [code]

    A comprehensive survey on general world models, including world models for video generation, autonomous driving and autonomous agents.

    dise DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
    Guosheng Zhao*, Xiaofeng Wang*, Zheng Zhu*, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang
    arXiv, 2024
    [arXiv] [Page] [code]

    DriveDreamer-2 is the first world model to generate customized driving videos in a user-friendly manner.

    dise WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
    Xiaofeng Wang*, Zheng Zhu*, Guan Huang*, Boyuan Wang, Xinze Chen, Jiwen Lu
    arXiv, 2024
    [arXiv] [Page]

    WorldDreamer, a pioneering world model to foster a comprehensive comprehension of general world physics and motions, which significantly enhances the capabilities of video generation.

    dise On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
    Licheng Wen*, Xuemeng Yang*, Daocheng Fu*, Xiaofeng Wang*, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao
    ICLR Workshop on LLM Agents, 2024
    [arXiv] [Page]

    This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios.

    dise Drivedreamer: Towards real-world-driven world models for autonomous driving
    Xiaofeng Wang*, Zheng Zhu*, Guan Huang, Xinze Chen, Jiagang Zhu, Jiwen Lu
    arXiv, 2023
    [arXiv] [page] [code]

    DriveDreamer is the first world model established from real-world driving scenarios. It empowers controllable driving video generation and enables the prediction of reasonable driving policies.

    dise OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
    Xiaofeng Wang*, Zheng Zhu*, Wenbo Xu*, Yunpeng Zhang, Yi Wei, Xu Chi, Yun Ye, Dalong Du, Jiwen Lu , Xingang Wang
    IEEE International Conference on Computer Vision (ICCV), 2023
    [arXiv] [Code]

    Towards a comprehensive benchmarking of surrounding perception algorithms, we propose OpenOccupancy, which is the first surrounding semantic occupancy perception benchmark.

    dise StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion
    Bohan Li, Yasheng Sun, Xin Jin, Wenjun Zeng, Zheng Zhu, Xiaofeng Wang, Yunpeng Zhang, James Okae, Hang Xiao, Dalong Du
    International Joint Conferences on Artificial Intelligence (IJCAI), 2024
    [arXiv] [Code]

    We propose StereoScene for 3D Semantic Scene Completion (SSC), which explores taking full advantage of light-weight camera inputs without resorting to any external 3D sensors.

    dise Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
    Xiaofeng Wang, Zheng Zhu, Yunpeng Zhang, Guan Huang, Yun Ye, Wenbo Xu, Ziwei Chen , Xingang Wang
    IEEE Conference on Computer Vision and Pattern Recogintion (CVPR), 2023
    [arXiv] [Code]

    We propose the Autonomousdriving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving.

    dise LiftedCL: Lifting Contrastive Learning for Human-Centric Perception
    Ziwei Chen , Qiang Li , Xiaofeng Wang, Wankou Yang
    International Conference on Learning Representations (ICLR), 2023
    [paper] [page] [code]

    We propose the Lifting Contrastive Learning (LiftedCL) to obtain 3D-aware human-centric representations which absorb 3D human structure information.

    dise MOVEDepth: Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning
    Xiaofeng Wang, Zheng Zhu, Guan Huang, Xu Chi, Yun Ye, Ziwei Chen , Xingang Wang
    AAAI Conference on Artificial Intelligence (AAAI), 2023
    [arXiv] [code]

    MOVEDepth is a self-supervised depth estimation method that explores monocular cues to enhance the multi-frame depth learning.

    dise MVSTER: Epipolar Transformer for Efficient Multi-View Stereo
    Xiaofeng Wang, Zheng Zhu, Fangbo Qin, Yun Ye, Guan Huang, Xu Chi, Yijia He , Xingang Wang
    European Conference on Computer Vision (ECCV), 2022
    [arXiv] [code]

    We propose a novel end-to-end Transformer-based method for multi-view stereo, named MVSTER. It leverages the proposed epipolar Transformer to efficiently learn 3D associations along epipolar line.

    Honors and Awards

  • 2019 Ruihua Cup Annual Outstanding College Student / 瑞华杯大学生年度人物
  • 2019 President's Medal of NJUST / 校长奖章
  • 2019 National Scholarship / 国家奖学金
  • 2018 CSC Scholarship / 国家公派留学奖学金
  • 2017 National Scholarship / 国家奖学金

  • Website Template


    © Xiaofeng Wang | Last updated: Mar 16, 2024