我是北京大学计算机科学与技术专业直博四年级学生 (预计 2026 年毕业),本科毕业于华南理工大学电子与信息学院(2021 届)。

人生格言: 知行合一,格物致知;志存高远,脚踏实地。

📌 主要研究方向

我的研究方向主要集中在 “多模态大模型与图像/视频理解” 领域,具体包括:

  • 多模态大模型 (视频理解), 包括:
    • 泛视频理解: Qwen2.5-VL core contributor
    • 音视频理解: VideoLLaMA2; CMM
    • 流视频理解: VideoLLaMA3
    • 长视频理解: Inf-CL (CVPR 2025 Highlight)
    • 细粒度视频理解: VideoRefer (CVPR 2025)
  • 图像/视频分割,包括:
    • 弱监督分割:  OCR (CVPR 2023)
    • 视频实例分割: TAR (ICCV 2025)
    • 多模态分割:  WiCo (IJCAI 2023, Neurocomputing 2024); PVD (AAAI 2024); BriVIS (AAAI 2025)
    • 医学图像分割: Fused U-Net (Medical Physics 2021)

📈 学术成果

目前已发表论文 20+ 篇,总 Google Scholar 引用量为 Citations

所参与开源项目获得广泛关注,代表性项目的 GitHub Star 数如下:

VideoLLaMA2 Stars VideoLLaMA3 Stars Inf-CL Stars CMM Stars VideoRefer Stars

💬 联系方式

如果您对我的研究感兴趣,欢迎联系交流合作或提供实习 / 全职机会 🙏🙏。这是我的联系邮箱: cyanlaser@stu.pku.edu.cn

🔥 News

  • 2021.03: I join Sensetime as a research intern in shenzhen for developing MMSegmentation toolkit.

📝 Publications

🎞️ Multi-modal LLM (Video Understanding)

Qwen2.5-VL
sym

Qwen2.5-VL Technical Report
Core Contributors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, …, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, Junyang Lin

Project | Code |

VideoLLaMA3
sym

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Boqiang Zhang* Kehan Li*, Zesen Cheng*, Zhiqiang Hu*, Yuqian Yuan*, Guanzheng Chen*, Sicong Leng*, Yuming Jiang*, Hang Zhang*, Xin Li*, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao

Code | hf_space | hf_space | hf_paper

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Zesen Cheng*, Sicong Leng*, Hang Zhang*, Yifei Xin*, Xin Li*, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

Code | hf_space | hf_space | hf_paper

CMM
sym

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng*, Yun Xing*, Zesen Cheng*, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing

Project | Code | hf_data

CVPR 2025 Highlight
sym

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy (Hightlight)
Zesen Cheng*, Hang Zhang*, Kehan Li*, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing

Code | hf_paper | PyPI

CVPR 2025
sym

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, et al.

Project | Code | | |

🧩 Image/Video Segmentation

ICCV 2025
tar
AAAI 2024
tar

Parallel Vertex Diffusion for Unified Visual Grounding
Zesen Cheng, Kehan Li, Peng Jin, et al.

IJCAI 2023
tar
CVPR 2023
tar

Out-of-Candidate Rectification for Weakly-supervised Semantic Segmentation
Zesen Cheng, Pengchong Qiao, Kehan Li, Siheng Li, et al.

Others

🥇 Honors and Awards

  • 2023.10 Pingan Scholarship
  • 2020.10 National Scholarship (Undergraduate) (Top 1%)
  • 2019.10 The Second Prize Scholarship
  • 2018.10 National Scholarship (Undergraduate) (Top 1%)

📖 Educations

💻 Internships

Flag Counter