Zhiyuan Ma is a postdoctoral fellow in the Department of Electronic Engineering at Tsinghua University and a recipient of the National Natural Science Foundation of China Youth Fund. His co-supervisor is Professor Bowen Zhou (周伯文).

As the person in charge, he presided over a number of national natural science projects, postdoctoral general projects, and postdoctoral national funding plan projects, and also participated in a number of major projects of the Ministry of Science and Technology in 2030. He was also a member of ACL, ACM, CCF professional member, member of the Beijing BAAI-Qingyuan research group, and reviewer of top international journals and conferences such as TNNLS, ICLR, ICML, NeurIPS, ACL, EMNLP, COLING, NAACL, AAAI, AISTATS, ECAI, CIKM, etc.

He received his Ph.D. from Huazhong University of Science and Technology and graduated from the Huazhong University of Science and Technology one year ahead of schedule in June 2023 (the first doctoral student to graduate ahead of schedule). During his doctoral studies, he has won many honors, including Outstanding Doctoral Graduate, Outstanding Graduation Thesis, National Scholarship, Guanghua Scholarship, BIGO Enterprise Scholarship, Outstanding Graduate Student Cadre, Three Good Graduate Student, Zhiyin Pilot Student, Zhiyin Pillar Student Academic Research Award (the only one in the college), and won the Best Paper of the First Annual Academic Conference of HUST-CS in 2022.

His research interests include generative ai, natural language processing , embodied AI, vision and language , task-oriented dialogue systems , controllable generation and AI for Science . His main work has been published in the top international conferences of artificial intelligence and natural language processing, such as NeurIPS, ACL, EMNLP, AAAI, ACM MM, and COLING, as the first author. His work in the field of multimodal pre-training, CMAL, has been cited and positively evaluated by LeCun (Turing Award winner and Facebook Chief AI Scientist). This work innovatively proposed the pre-training method of cross-modal associative learning, and made new breakthroughs in modality-aligned vision-language pre-training.