Personal tools
Inference of Missing or Degraded Data for Noise Robust Speech Processing
| What |
|
|---|---|
| When |
May 20, 2010 from 12:00 PM to 01:00 PM |
| Where | Engr. IV Maxwell Room 57-124 |
| Add event to calendar |
|
Bengt Jonas Borgstrom
Advisor: Abeer Alwan
Thursday, May 20, 2010 at 12:00pm
Engr. IV Maxwell Room 57-124
Abstract:
In real world speech processing systems, speech signals generally suffer
degradation due to background acoustic noise or reverberation. This
dissertation addresses two general frameworks for which compensation of
corruptive acoustic noise can benefit system performance, namely
automatic speech recognition (ASR) and single-channel speech
enhancement. At the heart of each task lies the general problem of
inferring missing or degraded speech data, where signal ambiguity is due
to acoustic noise.
In the case of ASR, front-end missing feature (MF) spectral reconstruction is explored. Two solutions are offered, the first of which uses HMM-based processing and accounts for temporal and/or frequency correlation. The second exploits the sparsity of spectrographic speech data to formulate the reconstruction problem as a linear program. Each approach is successfully applied both in both the Mel-filtered and log Mel-filtered domains. Finally, a statistical approach to Mel-domain mask estimation is proposed, which is used to differentiate between reliable and unreliable time-frequency components.
In the case of single-channel speech enhancement, statistical model-based methods are studied. A unified framework is presented for deriving short-time spectral amplitude (STSA) estimators which assume generalized Gamma-distributed speech priors. Additionally, a unified framework is proposed for developing STSA estimators which assume phase equivalence of speech and noise components. Finally, the role of temporal correlation in statistical speech enhancement is explored, resulting in a novel correlation-based STSA estimator.
Biography:
Bengt Jonas Borgstrom received his BS. and MS. from the University of
California, Los Angeles. He is currently pursuing a Ph.D. under the
supervision of Dr. Abeer Alwan. His research interests lie in noise
robust automatic speech recognition, audiovisual speech processing,
speech enhancement, and speech coding. In 2006, he received the
Outstanding Masters Student recognition for his work in audiovisual
speech synthesis. He has held consulting positions at Broadcom and HRL
laboratory, assisting with various speech and image processing projects.
