Robust Speech and Bird Song Processing using Multi-band Correlograms and Sparse Representations
May 19, 2014
from 12:00 PM to 02:00 PM
|Where||Engr. IV Bldg, Tesla Room 53-125|
|Contact Name||Lee Ngee Tan|
|Add event to calendar||
Lee Ngee Tan
Advisor: Prof. Abeer Alwan
This dissertation focuses on algorithms for robust speech and bird song processing. Many applications perform well under ideal signal conditions, e.g. noise-free, full bandwidth, sufficient training data. However, a large degradation in performance is generally observed when the input signal condition deviates from these ideal conditions. This dissertation describes robust algorithms for three applications, namely human-pitch detection, automatic speech recognition, and birdsong phrase classification. In the first application, a noise-robust, multi-band summary correlogram (MBSC)-based pitch detector is proposed. Novel signal processing schemes, which include comb-filter channel selection and subband reliability weighting, are designed to enhance the MBSC's peak at the most likely pitch period.
In the second application, a feature enhancement scheme using jointly-sparse reference soft-mask (SMref) and estimated soft-mask (SMest) representations is developed for noise-robust automatic speech recognition (ASR). Reference and estimated soft-mask exemplar-pairs are extracted from clean and noisy utterance-pairs in the training data. Using a sparsity-based dictionary learning algorithm, jointly-sparse SMref and SMest dictionary representations are trained from the exemplar-pairs. The sparse linear combination of SMest dictionary representations that best approximates the test utterance's estimated soft-mask is applied to the SMref dictionary to produce an enhanced soft-mask. This enhanced soft-mask is then used to perform noise suppression on the spectrogram from which features for ASR are extracted.
In the third application, a simple exemplar-based sparse representation (SR) classifier is evaluated on limited data for birdsong phrase classification and verification. Song recordings of the Cassin's Vireo are used for performance evaluation. This study of the SR classifier for bird phrase classification is inspired a paper that proposed the SR classifier for face recognition and outlier face detection, and reported good performance with only 7 training images per subject. Algorithmic enhancements are subsequently added to the original SR classification framework to improve the classification accuracy of automatically detected and segmented phrases, and phrases sang by bird individuals that are not found in the training set. These algorithmic enhancements include dynamic time warping (DTW), feature normalization, and a two-pass SR classification framework.
Lee Ngee Tan is a Ph.D candidate in the Electrical Engineering Department at UCLA. She received her B.Eng. (Electrical) degree from the National University of Singapore, and her M.S. in Electrical Engineering from UCLA. Her M.S. and Ph.D. degrees are sponsored by DSO National Laboratories (Singapore), and in part by DARPA, NSF, and a UCLA Dissertation Year Fellowship.