📝 Publications
🎙 Speech Synthesis

Kimi-Audio Technical Report
Kimi Team
- We present Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation. This repository contains the official implementation, models, and evaluation toolkit for Kimi-Audio.
- Kimi-Audio is pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data, and achieve SOTA results on numerous audio benchmarks.
- Find out the official codes and weights.
- We also developed and open-sourced an Evaluation Toolkit to reproduce all baselines and our model’s results easily.

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju*, Yuancheng Wang*, Kai Shen*, Xu Tan*, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao
- NaturalSpeech 3 is the advanced version of NaturalSpeech serials with speech factorization.

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Kai Shen*, Zeqian Ju*, Xu Tan*, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian
- NaturalSpeech 2 is the first large-scale non-autoregressive text-to-speech synthesis system. It can generate high-quality speech with only a 3-second prompt!
-
Work in Progress
MoonCast: High-Quality Zero-Shot Podcast Generation, Zeqian Ju, Dongchao Yang, Jianwei Yu, Kai Shen, Yichong Leng, Zhengtao Wang, Xu Tan, Xinyu Zhou, Tao Qin, Xiangyang Li, Code -
Work in Progress
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis, Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao ICLR 2024
PromptTTS 2: Describing and Generating Voices with Text Prompt, Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang BianEMNLP 2023
Mask the correct tokens: An embarrassingly simple approach for error correction, Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin
📚 Machine Translation
Work in Progress
A Study on ReLU and Softmax in Transformer, Kai Shen, Junliang Guo, Xu Tan, Siliang Tang, Rui Wang, Jiang Bian
🧑🎨 Graph Neural Network

Survey Paper: Graph neural networks for natural language processing: A survey
Lingfei Wu, Yu Chen, Kai Shen(First Student Author), Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long
Graph4NLP Library | , most important contributor
- DLG4NLP consists: 1) the most comprehensive survey paper on GNNs for NLP; 2) a library for the easy use of GNNs for NLP.
NeurIPS 2021
Learning to Generate Visual Questions with Noisy Supervision, Kai Shen, Lingfei Wu, Siliang Tang, Yueting Zhuang, Zhuoye Ding, Yun Xiao, Bo LongUnder Review
Ask Question with Double Hints: Visual Question Generation with Answer-awareness and Region-reference, Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Zhu Zhang, Yu Qiang, Yueting ZhuangIJCAI 2020
Hierarchical Attention Based Spatial-Temporal Graph-to-Sequence Learning for Grounded Video Description, Kai Shen, Lingfei Wu, Fangli Xu, Siliang Tang, Jun Xiao, Yueting ZhuangGraph Neural Networks: Foundations, Frontiers, and Applications
Graph Neural Networks in Computer Vision, Siliang Tang, Wenqiao Zhang, Zongshen Mu, Kai Shen, Juncheng Li, Jiacheng Li, Lingfei Wu
Others
Work in Progress
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation, Aoxiong Yin, Kai Shen, Yichong Leng, Xu Tan, Xinyu Zhou, Juncheng Li, Siliang Tang, CodeACL 2024 (Main Conference)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text, Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting ZhuangPreprint
Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce, Xueying Zhang, Kai Shen, Chi Zhang, Xiaochuan Fan, Yun Xiao, Zhen He, Bo Long, Lingfei Wu