I am now working on building the cutting-edge multimodal LLM at Bytedance Seed.
I am looking for interns to work with me on multimodal LLM. Feel free to contact me if you are interested!I obtained my Ph.D. degree at Zhejiang University, supervised by Prof. Siliang Tang(汤斯亮) and Prof. Yueting Zhuang(庄越挺). I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia and Lingfei Wu (吴凌飞) closely.
My research interest includes multimodal LLM, Text-to-Speech Synthesis, ASR Error Correction, Graph Neural Network and Neural Machine Translation. I have published 10+ papers includes ICML, ICLR, NeurIPS, EMNLP, IJCAI, et.al.
I have developed: 1. the large-scale non-autoregressive text-to-Speech synthesis system NaturalSpeech 2 and the advanced version NaturalSpeech 3 when in MSRA; 2. the SOTA audio foundation model Kimi-Audio when in Moonshot.AI.