I am a Ph.D. student of Zhejiang University, supervised by Prof. Siliang Tang(汤斯亮) and Prof. Yueting Zhuang(庄越挺). I obtained my B.S. degree at Zhejiang University. I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia and Lingfei Wu (吴凌飞) closely.

My research interest includes Text-to-Speech Synthesis, ASR Error Correction, Graph Neural Network and Neural Machine Translation. I have published 10+ papers includes ICML, ICLR, NeurIPS, EMNLP, IJCAI, et.al.

I have developed the first large-scale non-autoregressive text-to-Speech synthesis system NaturalSpeech 2 and the advanced version NaturalSpeech 3.

I am the main contributor of Graph4NLP project. I am one of the book chapter contributors of the book “Graph Neural Networks: Foundations, Frontiers, and Applications”.

🔥 News

  • 2024.05: 🎉 NaturalSpeech 3 is accepted by ICML2024 as Oral presentation! 1 Paper is accepted by ACL 2024 main conference.
  • 2024.03: 🎉 We are delighted to released NaturalSpeech 3, which is a advanced version of NaturalSpeech series with speech factorization.
  • 2024.01: 🎉 NaturalSpeech 2 and PromptTTS 2 are accepted by ICLR2024 for Spotlight and Poster presentation!
  • 2023.09: 🎉 We released PromptTTS 2, which is a large-scale TTS system using text prompt.
  • 2023.04: 🎉 We are delighted to release our NaturalSpeech 2, which is the first large-scale NAR TTS system. It can generate high-quality speech with only a 3-second prompt!
  • 2021/06: 🎉 Check out our most recent survey paper, titled “Graph Neural Networks for Natural Language Processing: A Survey”! First comprehensive survey on GNNs for NLP!
  • 2021.06: 🔥 We are delighted to release our Graph4NLP Library (⭐️1.6k+), which is the first library for the easy use of GNNs for NLP!

📝 Publications

🎙 Speech Synthesis

ICML 2024 Oral

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju*, Yuancheng Wang*, Kai Shen*, Xu Tan*, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao


  • NaturalSpeech 3 is the advanced version of NaturalSpeech serials with speech factorization.
ICLR 2024 Spotlight

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Kai Shen*, Zeqian Ju*, Xu Tan*, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian


  • NaturalSpeech 2 is the first large-scale non-autoregressive text-to-speech synthesis system. It can generate high-quality speech with only a 3-second prompt!

📚 Machine Translation

🧑‍🎨 Graph Neural Network

Foundations and Trends in Machine Learning

DLG4NLP Project

Survey Paper: Graph neural networks for natural language processing: A survey
Lingfei Wu, Yu Chen, Kai Shen(First Student Author), Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long

Graph4NLP Library | , most important contributor

  • DLG4NLP consists: 1) the most comprehensive survey paper on GNNs for NLP; 2) a library for the easy use of GNNs for NLP.


🎖 Honors and Awards

  • 2020-2021 National Scholarship (Top 1%)
  • 2017-2018 The Third Prize Scholarship (Top 20%)
  • 2016-2017 The Third Prize Scholarship (Top 20%)
  • 2015-2016 The Second Prize Scholarship (Top 8%)

📖 Educations

  • 2019.06 - 2024.06, Ph.D., Zhejiang University, Hangzhou.
  • 2015.09 - 2019.06, Undergraduate, Computer Science, Zhejiang Univeristy, Hangzhou.
  • 2012.09 - 2015.06, The Second Middile School, Huzhou, Zhejiang.

💻 Internships