I am a Ph.D. student of Zhejiang University, supervised by Prof. Siliang Tang(汤斯亮) and Prof. Yueting Zhuang(庄越挺). I obtained my B.S. degree at Zhejiang University. I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia and Lingfei Wu (吴凌飞) closely.

My research interest includes Text-to-Speech Synthesis, ASR Error Correction, Graph Neural Network and Neural Machine Translation. I have published 10+ papers includes ICML, ICLR, NeurIPS, EMNLP, IJCAI, et.al.

I have developed the first large-scale non-autoregressive text-to-Speech synthesis system NaturalSpeech 2 and the advanced version NaturalSpeech 3.

I am the main contributor of Graph4NLP project. I am one of the book chapter contributors of the book “Graph Neural Networks: Foundations, Frontiers, and Applications”.

🔥 News

  • 2025.04: 🎉 We are delighted to release Kimi-Audio, which is an open-source audio foundation model excelling in audio understanding, generation, and conversation.
  • 2025.03: 🎉 We are delighted to release LanDiff, which a novel text-to-video generation framework that synergizes the strengths of Language Models and Diffusion Models. We are delighted to release MoonCast, which is a high-quality zero-shot podcast generation system.
  • 2024.05: 🎉 NaturalSpeech 3 is accepted by ICML2024 as Oral presentation! 1 Paper is accepted by ACL 2024 main conference.
  • 2024.03: 🎉 We are delighted to released NaturalSpeech 3, which is a advanced version of NaturalSpeech series with speech factorization.
  • 2024.01: 🎉 NaturalSpeech 2 and PromptTTS 2 are accepted by ICLR2024 for Spotlight and Poster presentation!
  • 2023.09: 🎉 We released PromptTTS 2, which is a large-scale TTS system using text prompt.
  • 2023.04: 🎉 We are delighted to release our NaturalSpeech 2, which is the first large-scale NAR TTS system. It can generate high-quality speech with only a 3-second prompt!
  • 2021.06: 🔥 We are delighted to release our Graph4NLP Library (⭐️1.6k+), which is the first library for the easy use of GNNs for NLP! Also check out our most recent survey paper, titled “Graph Neural Networks for Natural Language Processing: A Survey”! First comprehensive survey on GNNs for NLP!

📝 Publications

🎙 Speech Synthesis

Technique Report
sym

Kimi-Audio Technical Report
Kimi Team

Project

  • We present Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation. This repository contains the official implementation, models, and evaluation toolkit for Kimi-Audio.
  • Kimi-Audio is pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data, and achieve SOTA results on numerous audio benchmarks.
  • Find out the official codes and weights.
  • We also developed and open-sourced an Evaluation Toolkit to reproduce all baselines and our model’s results easily.
ICML 2024 Oral
sym

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju*, Yuancheng Wang*, Kai Shen*, Xu Tan*, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Project

  • NaturalSpeech 3 is the advanced version of NaturalSpeech serials with speech factorization.
ICLR 2024 Spotlight
sym

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Kai Shen*, Zeqian Ju*, Xu Tan*, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Project

  • NaturalSpeech 2 is the first large-scale non-autoregressive text-to-speech synthesis system. It can generate high-quality speech with only a 3-second prompt!

📚 Machine Translation

🧑‍🎨 Graph Neural Network

Foundations and Trends in Machine Learning
sym

DLG4NLP Project

Survey Paper: Graph neural networks for natural language processing: A survey
Lingfei Wu, Yu Chen, Kai Shen(First Student Author), Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long

Graph4NLP Library | , most important contributor

  • DLG4NLP consists: 1) the most comprehensive survey paper on GNNs for NLP; 2) a library for the easy use of GNNs for NLP.

Others

🎖 Honors and Awards

  • 2020-2021 National Scholarship (Top 1%)
  • 2017-2018 The Third Prize Scholarship (Top 20%)
  • 2016-2017 The Third Prize Scholarship (Top 20%)
  • 2015-2016 The Second Prize Scholarship (Top 8%)

📖 Educations

  • 2019.06 - 2024.06, Ph.D., Zhejiang University, Hangzhou.
  • 2015.09 - 2019.06, Undergraduate, Computer Science, Zhejiang Univeristy, Hangzhou.
  • 2012.09 - 2015.06, The Second Middile School, Huzhou, Zhejiang.

💻 Internships