I am now working on building the cutting-edge multimodal LLM at Bytedance Seed.

I am looking for interns to work with me on multimodal LLM. Feel free to contact me if you are interested!

I obtained my Ph.D. degree at Zhejiang University, supervised by Prof. Siliang Tang(汤斯亮) and Prof. Yueting Zhuang(庄越挺). I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia and Lingfei Wu (吴凌飞) closely.

My research interest includes multimodal LLM, Text-to-Speech Synthesis, ASR Error Correction, Graph Neural Network and Neural Machine Translation. I have published 10+ papers includes ICML, ICLR, NeurIPS, EMNLP, IJCAI, et.al.

I have developed: 1. the large-scale non-autoregressive text-to-Speech synthesis system NaturalSpeech 2 and the advanced version NaturalSpeech 3 when in MSRA; 2. the SOTA audio foundation model Kimi-Audio when in Moonshot.AI.