I am a Ph.D. student of Zhejiang University, supervised by Prof. Siliang Tang(汤斯亮) and Prof. Yueting Zhuang(庄越挺). I obtained my B.S. degree at Zhejiang University. I also collaborate with Xu Tan (谭旭) from Microsoft Research Asia and Lingfei Wu (吴凌飞) closely.
My research interest includes Text-to-Speech Synthesis, ASR Error Correction, Graph Neural Network and Neural Machine Translation. I have published 10+ papers includes ICML, ICLR, NeurIPS, EMNLP, IJCAI, et.al.
I have developed the first large-scale non-autoregressive text-to-Speech synthesis system NaturalSpeech 2 and the advanced version NaturalSpeech 3.
I am the main contributor of Graph4NLP project. I am one of the book chapter contributors of the book “Graph Neural Networks: Foundations, Frontiers, and Applications”.
🔥 News
- 2024.05: 🎉 NaturalSpeech 3 is accepted by ICML2024 as Oral presentation! 1 Paper is accepted by ACL 2024 main conference.
- 2024.03: 🎉 We are delighted to released NaturalSpeech 3, which is a advanced version of NaturalSpeech series with speech factorization.
- 2024.01: 🎉 NaturalSpeech 2 and PromptTTS 2 are accepted by ICLR2024 for Spotlight and Poster presentation!
- 2023.09: 🎉 We released PromptTTS 2, which is a large-scale TTS system using text prompt.
- 2023.04: 🎉 We are delighted to release our NaturalSpeech 2, which is the first large-scale NAR TTS system. It can generate high-quality speech with only a 3-second prompt!
- 2021/06: 🎉 Check out our most recent survey paper, titled “Graph Neural Networks for Natural Language Processing: A Survey”! First comprehensive survey on GNNs for NLP!
- 2021.06: 🔥 We are delighted to release our Graph4NLP Library (⭐️1.6k+), which is the first library for the easy use of GNNs for NLP!
📝 Publications
🎙 Speech Synthesis
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju*, Yuancheng Wang*, Kai Shen*, Xu Tan*, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao
- NaturalSpeech 3 is the advanced version of NaturalSpeech serials with speech factorization.
Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Kai Shen*, Zeqian Ju*, Xu Tan*, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian
- NaturalSpeech 2 is the first large-scale non-autoregressive text-to-speech synthesis system. It can generate high-quality speech with only a 3-second prompt!
-
Work in Progress
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis, Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao ICLR 2024
PromptTTS 2: Describing and Generating Voices with Text Prompt, Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang BianEMNLP 2023
Mask the correct tokens: An embarrassingly simple approach for error correction, Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin
📚 Machine Translation
Work in Progress
A Study on ReLU and Softmax in Transformer, Kai Shen, Junliang Guo, Xu Tan, Siliang Tang, Rui Wang, Jiang Bian
🧑🎨 Graph Neural Network
Survey Paper: Graph neural networks for natural language processing: A survey
Lingfei Wu, Yu Chen, Kai Shen(First Student Author), Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long
Graph4NLP Library | , most important contributor
- DLG4NLP consists: 1) the most comprehensive survey paper on GNNs for NLP; 2) a library for the easy use of GNNs for NLP.
NeurIPS 2021
Learning to Generate Visual Questions with Noisy Supervision, Kai Shen, Lingfei Wu, Siliang Tang, Yueting Zhuang, Zhuoye Ding, Yun Xiao, Bo LongUnder Review
Ask Question with Double Hints: Visual Question Generation with Answer-awareness and Region-reference, Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Zhu Zhang, Yu Qiang, Yueting ZhuangIJCAI 2020
Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description, Kai Shen, Lingfei Wu, Fangli Xu, Siliang Tang, Jun Xiao, Yueting ZhuangGraph Neural Networks: Foundations, Frontiers, and Applications
Graph Neural Networks in Computer Vision, Siliang Tang, Wenqiao Zhang, Zongshen Mu, Kai Shen, Juncheng Li, Jiacheng Li, Lingfei Wu
Others
ACL 2024 (Main Conference)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text, Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting ZhuangPreprint
Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce, Xueying Zhang, Kai Shen, Chi Zhang, Xiaochuan Fan, Yun Xiao, Zhen He, Bo Long, Lingfei Wu
🎖 Honors and Awards
- 2020-2021 National Scholarship (Top 1%)
- 2017-2018 The Third Prize Scholarship (Top 20%)
- 2016-2017 The Third Prize Scholarship (Top 20%)
- 2015-2016 The Second Prize Scholarship (Top 8%)
📖 Educations
- 2019.06 - 2024.06, Ph.D., Zhejiang University, Hangzhou.
- 2015.09 - 2019.06, Undergraduate, Computer Science, Zhejiang Univeristy, Hangzhou.
- 2012.09 - 2015.06, The Second Middile School, Huzhou, Zhejiang.
💻 Internships
- 2022.03 - 2023.09, MSRA, Machine Learning Group, Beijing.
- 2021.04 - 2022.03, JD.com, Beijing.
- 2018.06 - 2019.02, YiWise, Hangzhou.