NUS-48E Sung and Spoken Lyrics Corpus

Welcome to the NUS-48E Sung and Spoken Corpus developed at Sound and Music Computing Laboratory at National University of Singapore.

The corpus is a 169-min collection of audio recordings of the sung and spoken lyrics of 48 (20 unique) English songs by 12 subjects and a complete set of transcriptions and duration annotations at the phone-level for all recordings of sung lyrics, comprising 25,474 phone instances.

The corpus is available here.

The corpus consists of the following:

  • Twelve folders of the 12 subjects
  • Each folder consists of “sing” and “read” folders, which consist of 4 sung and corresponding spoken .wav files, and their time-aligned phone-level manual annotations in .txt files
  • A readme file

For information about any of the content described here, please contact Associate Professor Ye Wang ( at the SMC Lab.

This dataset is being shared with the agreement that it will be used solely for research purposes. On use of this dataset, please cite this paper:

Zhiyan Duan, Haotian Fang, Bo Li, Khe Chai Sim and Ye Wang. “The NUS Sung and Spoken Lyrics Corpus: A Quantitative Comparison of Singing and Speech“. Asia-Pacific Signal and Information Processing Association Annual Submit and Conference 2013 (APSIPA ASC 2013).