The Voice Source in Speech Production: from Models to Applications
Apr 28, 2014
from 12:00 PM to 02:00 PM
|Where||Engr. IV Bldg., Tesla Room 53-125|
|Contact Name||Gang Chen|
|Add event to calendar||
Advisor: Professor Abeer Alwan
The voice source contains important lexical and non-lexical information. The non-lexical information can convey, for example, prosodic events, emotional status, as well as cues pertaining to the uniqueness of the speaker's voice. A better understanding, and eventually a better model of the voice source, would benefit various speech applications, such as speech recognition, speech synthesis, speaker identification, age/gender classification, as well as clinical assessments.
This dissertation has three main goals. The first one is to better understand the voice source through analyzing images of the vocal folds using laryngeal high-speed videoendoscopy (HSV) recordings. A new automatic method is proposed to compactly summarize the overall spatial synchronization pattern of vocal fold vibration for the entire laryngeal area from HSV data. Additionally, a new measure is proposed to adequately capture perceptually-important variations in glottal area pulse shapes, which are extracted from HSV data.
The second goal is to propose new voice source models and evaluate them in different applications. In the first application, a new source model and a noise-robust automatic source estimation algorithm are proposed to estimate the voice source from speech signals. Results in both clean and noisy conditions show that the proposed model and algorithm are robust in accurately estimating the voice source signal. The second application is to use the proposed source model for vowel synthesis. Perceptual listening experiments show that the proposed model provides a better perceptual match to the target voice than do traditional models.
The third goal is to study the acoustic consequence of a physiological vocal-fold vibration pattern---the glottal gap effect, and apply our findings to a gender classification task of children's voices. Voice source related measures are found to improve classification accuracy, especially for younger (10–15 year old) speakers.
This research was supported in part by NSF Grant No. IIS-1018863.
Gang Chen is a Ph.D. candidate in Electrical Engineering Department at UCLA under the supervision of Prof. Abeer Alwan. He obtained his B.S. degree in Electronic Engineering, Tsinghua University, Beijing, China in 2008 and M.S degree in Electrical Engineering from UCLA in 2010. He was a finalist of best student paper award from 2013 Interspeech Conference held in Lyon, France. He has authored more than 10 refereed journal and conference papers. In the summers from 2010 to 2013, he was an intern at 3M.Cogent, Disney Research, Starkey Lab, and Qualcomm, respectively. His research interests are in voice source modeling, voice quality analysis, and speech synthesis.