Noise Robust Signal Processing for Human Pitch Tracking and Bird Song Classification and Detection
Nov 30, 2011
from 08:30 AM to 09:30 AM
|Where||Engr IV - Maxwell Room 57-124|
|Add event to calendar||
Advisor: Abeer Alwan
Human Pitch, or fundamental frequency (F0), is the vibrating frequency of vocal folds when a voiced sound is being vocalized.
For F0 tracking, the investigation is carried out in the direction of reducing F0 estimation and voicing decision errors under noisy conditions.
A novel Statistical Algorithm for F0 Estimation, SAFE, is proposed to improve the accuracy of estimation under both clean and noisy conditions. Prominent Signal-to-Noise Ratio peaks in speech spectra constitute a robust information source from which F0 can be inferred. A probabilistic framework is proposed to model the effect of noise on voiced speech spectra.
To reduce voicing decision errors, we introduce a model-based unvoiced/voiced classification frontend which can be used by any F0 tracking algorithm. We propose an F0 Frame Error metric which combines Gross Pitch Error and Voicing Decision Error to objectively evaluate the performance of fundamental frequency (F0) tracking methods.
For bird call classification, the investigation is carried out in the direction of signal denoising and discriminative feature extraction.
To enhance noisy bird calls, we propose a Correlation-Maximization denoising filter which utilizes periodicity information to remove additive noise in Antbird calls. We also developed a statistically-based noise-robust bird-call classification system which uses the denoising filter as a frontend. To obtain discriminative features for bird call classification, we extend the expectation-maximization (EM) algorithm to estimate not only optimal acoustic model parameters, but also optimal center frequencies and bandwidths of the filter bank used in cepstral feature extraction for bird call classification.
For bird song detection, syllables in Robin songs are clustered by comparing a distance measure defined as the average of aligned Linear Predictive Coding-based frame level differences. The syllable patterns inferred from the clustering results are used to improve the acoustic modeling of a hidden Markov model-based song detector.
Wei Chu (email@example.com) is a Ph.D. Candidate in Electrical Engineering at the University of California, Los Angeles. His thesis work is on noise robust signal processing for human pitch tracking (http://www.ee.ucla.edu/%7Eweichu/safe) and bird song recognition (http://www.ee.ucla.edu/%7Eweichu/bird). In 2007, he earned his Master's degree in Electronic Engineering from the Tsinghua University, where he developed a real-time speech recognition system with a non-speech input rejection frontend on chip. In 2004, he earned his Bachelor’s degree in Electronic Engineering from the North China University of Technology.
Since 2005, he has collaborated with researchers from Microsoft Research Redmond and Asia, Disney Research, Rosetta Stone, Mitsubishi Electronic Research Laboratory, and Intel on audio-visual and noise robust speech recognition, children speech analysis and recognition, pronunciation modeling, acoustic modeling, and speaker detection and clustering. Since 2007, He has been serving as a reviewer for IEEE Transactions on Audio, Speech, and Language Processing; Computer Speech and Language; ICASSP; Interspeech; and Automatic Speech Recognition and Understanding workshop. Currently, his research interests include speech recognition, speech perception, statistical signal processing, and machine learning.