I am a fourth-year direct PhD student in the Department of Computer Science and Technology at Peking University (expected graduation in 2026). Before that, I obtained my undergraduate degree from the School of Electronics and Information Engineering at South China University of Technology in 2021.

📌 Research Interests

My research primarily focuses on the field of “Multimodal Large Language Models and Image/Video Understanding”, specifically including:

  • Multimodal Large Language Model (video understanding), including:
    • General video understanding: Qwen2.5-VL core contributor
    • Audio-visual understanding: VideoLLaMA2; CMM
    • Streaming video understanding: VideoLLaMA3
    • Long video understanding: Inf-CL (CVPR 2025 Highlight)
    • Fine-grained video understanding: VideoRefer (CVPR 2025)
  • Image/video segmentation, including:
    • Weakly supervised segmentation: OCR (CVPR 2023)
    • Video instance segmentation: TAR (ICCV 2025)
    • Multimodal segmentation: WiCo (IJCAI 2023, Neurocomputing 2024); PVD (AAAI 2024); BriVIS (AAAI 2025)
    • Medical image segmentation: Fused U-Net (Medical Physics 2021)

📈 Academic Achievements

I have published over 20 papers, with a total of Citations citations on Google Scholar.

The open-source projects I have participated in have received widespread attention, with the number of GitHub Stars for representative projects as follows:

VideoLLaMA2 Stars VideoLLaMA3 Stars Inf-CL Stars CMM Stars VideoRefer Stars

💬 Contact Information

If you are interested in my research, please feel free to contact me for collaboration or to discuss internship/full-time opportunities 🙏🙏. My email address is: cyanlaser@stu.pku.edu.cn