Speech Normalization and Data Augmentation Techniques based on Acoustical and Physiological Constraints and their Applications to Child Speech Recognition

Speaker: Gary Yeung
Affiliation: Ph.D. Candidate

Also via zoom: https://ucla.zoom.us/j/5402246338

 

Abstract:

Recently, adult automatic speech recognition (ASR) system performance has improved dramatically. In contrast, child ASR systems have struggled to produce meaningful performance improvements in an era where demand for child speech technology is on the rise. A contributing factor to this trend is the shift towards deep learning methods for ASR, which generally require thousands of hours of speech data to train. While adult speech data is abundant, publicly available child speech data is sparse due, in part, to privacy concerns. Child ASR systems perform poorly when trained on adult speech due to the acoustic mismatch that results from body size differences, especially the vocal folds and the vocal tract, as well as the high variability of child speech.

This research analyzes the acoustical properties of child speech across various ages and compares them to the acoustic properties of adult speech. Specifically, the subglottal resonances (SGRs) and fundamental frequency () of vowel productions are investigated. These acoustic features are shown to be capable of predicting acoustic structures across speakers. As such, we propose feature extraction methods utilizing these properties to normalize the acoustic structure across speakers and reduce the acoustic mismatch between adult and child speech. This allows child ASR systems to leverage adult data for training and suggests a framework for a universal ASR system that need not be adult or child dependent. Furthermore, we demonstrate that when child speech data is limited, these feature normalization methods are capable of producing significant improvements in child ASR, even when using state-of-the-art deep learning techniques.

Biography:

Gary Yeung received his B.S. and M.S. in electrical and computer engineering from the University of California, Los Angeles (UCLA) in 2015 and 2017, respectively. During his studies, he received UCLA Henry Samueli School of Engineering’s Harry M. Showman Prize (2015), UCLA Electrical and Computer Engineering Department?s TA Excellence in Teaching Award (2018), and the IBM Ph.D. Fellowship (2018-2020). His research interests include child speech recognition, speech acoustics, and social robotics.

Date/Time:
Date(s) - Aug 26, 2021
1:00 pm - 3:00 pm

Location:
Via Zoom Only
No location, Los Angeles
Map Unavailable