📝 论文发表

🎞️ 多模态大语言模型 (视频理解)

Qwen2.5-VL
sym

Qwen2.5-VL Technical Report
Core Contributors: Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, …, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, Junyang Lin

Project | Code |

VideoLLaMA3
sym

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Boqiang Zhang* Kehan Li*, Zesen Cheng*, Zhiqiang Hu*, Yuqian Yuan*, Guanzheng Chen*, Sicong Leng*, Yuming Jiang*, Hang Zhang*, Xin Li*, Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao

Code | hf_space | hf_space | hf_paper

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Zesen Cheng*, Sicong Leng*, Hang Zhang*, Yifei Xin*, Xin Li*, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

Code | hf_space | hf_space | hf_paper

CMM
sym

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng*, Yun Xing*, Zesen Cheng*, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing

Project | Code | hf_data

CVPR 2025 Highlight
sym

Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy (Hightlight)
Zesen Cheng*, Hang Zhang*, Kehan Li*, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing

Code | hf_paper | PyPI

CVPR 2025
sym

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, et al.

Project | Code | | |

🧩 图像/视频分割

ICCV 2025
tar
AAAI 2024
tar

Parallel Vertex Diffusion for Unified Visual Grounding
Zesen Cheng, Kehan Li, Peng Jin, et al.

IJCAI 2023
tar
CVPR 2023
tar

Out-of-Candidate Rectification for Weakly-supervised Semantic Segmentation
Zesen Cheng, Pengchong Qiao, Kehan Li, Siheng Li, et al.

Others