Biography

About Me 🪪

I’m currently a Machine Learning Engineer at ByteDance, mainly on duty with research and development of Vision-Language Models for e-commerce safety. I received my Master of Science in Engineering in June 2024 at MCG Group, Department of Computer Science and Technology, Nanjing University, under the supervision of Assoc. Prof. Jie Tang. I also received my Bachelor of Science in Computer Science and Technology from Nanjing University in June 2021.

My research interests include Computer Vision, Multimodal Deep Learning and Generative Deep Learning, recently lie in Vision-Language Models (VLM), Generative Models (AIGC) and Object Tracking (VOT / VLT).

News 🔥

[ 2026.02.02 ] 🎉 Our paper GLAD, which focuses on Vision-Language Tracking, is accepted by International Journal of Computer Vision (IJCV)! arXiv and Project Page are available now.
[ 2025.09.19 ] 🎉 MERIT is accepted by NeurIPS 2025! Code and Dataset are available now.
[ 2025.06.12 ] 🤗 We propose MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval, comprising 320,000 queries with 135,000 products in 5 languages while covering 7 distinct product categories. Meanwhile, a novel fine-tuning framework named Coral is constructed to adapt pre-trained MLLMs for embedding extraction. arXiv and Project Page are available now.
[ 2024.03.21 ] 📖 A Zhihu Blog is published to explain main ideas of the paper.
[ 2023.10.18 ] 📄 Both CVF and arXiv version of ROMTrack are updated! This is a tracker utilizing the newly proposed object modeling paradigm, significantly improving robustness. Code is available now.
[ 2023.07.14 ] 🎉 Good News! One paper, abbreviated as ROMTrack, is accepted by ICCV 2023.

Publications 📝

GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates
Xingyu Luo, Yidong Cai, Jie Liu, Jie Tang, Gangshan Wu, Limin Wang.
➡️ International Journal of Computer Vision (IJCV), 2026.
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li.
➡️ The 39th Annual Conference on Neural Information Processing Systems (NeurIPS), 2025.
Robust Object Modeling for Visual Tracking
Yidong Cai, Jie Liu, Jie Tang, Gangshan Wu.
➡️ The 19th IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

Academic Services 💼

Journal Review :
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- IEEE Transactions on Multimedia (TMM)
- IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
- ACM Transactions on Multimedia Computing, Communications and Applications (TOMM)
- Journal of Visual Communication and Image Representation (JVCIR)
- Pattern Recognition (PR)
Conference Review :
- IEEE International Conference on Computer Vision (ICCV)
- British Machine Vision Conference (BMVC)
Teaching Assistant :
- Introduction to Computer System (ICS)
- Multimedia Technology

Educations 🎓

2021.9 - 2024.6: M.Sc., Nanjing University, Nanjing.
- Department of Computer Science and Technology.
- MCG Group, supervised by Assoc. Prof. Jie Tang, Prof. Liming Wang and Prof. Gangshan Wu.
2017.9 - 2021.6: B.Sc., Nanjing University, Nanjing.
- Department of Computer Science and Technology.
- 2020.9 - 2021.6: Research on Visual Object Tracking, supervised by Prof. Liming Wang.
2012.9 - 2017.6: Tianyi High School, Jiangsu.
- Both junior school and senior school.

Experiences 🖥️

2024.7 - Present: Machine Learning Engineer (MLE) - Multimodal.
- Governance and Experience, Global E-commerce, Data, ByteDance, Shanghai.
- Mainly focus on the research of Vision-Language Models for e-commerce safety. Familiar with MoE, Reinforcement Learning, Representation Learning, Distillation, and Agent.
2023.6 - 2023.9: Machine Learning Engineer (MLE) Intern - Computer Vision.
- Alimama, Taobao & Tmall Group, Alibaba Group, Hangzhou.
- Mainly focus on the research of Multimodal & AIGC algorithms. Familiar with Contrastive Learning and Diffusion Models.

Honors and Awards 🏅

SpotBonus Award in Global E-commerce - Data - ByteDance, 2026.
- Exploration and Business Implementation of Multimodal-LLM in E-commerce Scenario.
Outstanding Graduate Student of Nanjing University, 2024.
Tencent Scholarship, 2024.
Academic Scholarship of Nanjing University,
- 2021 (First Prize) & 2022 (Second Prize) & 2023 (Second Prize).
People’s Scholarship of Nanjing University, 2018 & 2019 & 2020.
- 2018 (Second Prize) & 2019 (First Prize) & 2020 (Second Prize).
Third Prize in Jiangsu Mathematical Modeling Competition, 2019.
Silver Medal in 12th China Southeast Mathematical Olympiad, 2015.

Contact 📫

Email:
- Gmail: dawnyc1123@gmail.com
- Edu-mail: yidong_cai@smail.nju.edu.cn

# About Me 🪪

# News 🔥

# Publications 📝

# Academic Services 💼

# Educations 🎓

# Experiences 🖥️

# Honors and Awards 🏅

# Contact 📫