📝 Publications

🎙 Speech Synthesis

Technique Report
sym

Kimi-Audio Technical Report
Kimi Team

Project

  • We present Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation. This repository contains the official implementation, models, and evaluation toolkit for Kimi-Audio.
  • Kimi-Audio is pre-trained on over 13 million hours of diverse audio data (speech, music, sounds) and text data, and achieve SOTA results on numerous audio benchmarks.
  • Find out the official codes and weights.
  • We also developed and open-sourced an Evaluation Toolkit to reproduce all baselines and our model’s results easily.
ICML 2024 Oral
sym

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju*, Yuancheng Wang*, Kai Shen*, Xu Tan*, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Project

  • NaturalSpeech 3 is the advanced version of NaturalSpeech serials with speech factorization.
ICLR 2024 Spotlight
sym

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Kai Shen*, Zeqian Ju*, Xu Tan*, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Project

  • NaturalSpeech 2 is the first large-scale non-autoregressive text-to-speech synthesis system. It can generate high-quality speech with only a 3-second prompt!

📚 Machine Translation

🧑‍🎨 Graph Neural Network

Foundations and Trends in Machine Learning
sym

DLG4NLP Project

Survey Paper: Graph neural networks for natural language processing: A survey
Lingfei Wu, Yu Chen, Kai Shen(First Student Author), Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, Bo Long

Graph4NLP Library | , most important contributor

  • DLG4NLP consists: 1) the most comprehensive survey paper on GNNs for NLP; 2) a library for the easy use of GNNs for NLP.

Others