Personal tools
Robust Automatic Speech Recognition Algorithms for Dealing with Noise and Accent
| What |
|
|---|---|
| When |
Aug 05, 2009 from 01:00 PM to 03:00 PM |
| Where | Engr IV Room 67-124 |
| Add event to calendar |
|
Hong You
Advisor: Abeer Alwan
Wednesday, August 5, 2009 at 1:00pm-3:00pm
Engr IV Room 67-124
Abstract:
Although there has been significant progress in automatic speech
recognition (ASR) systems over the past five decades, many challenging
problems still remain. In addition to the intrinsic confusability
between speech units, the environment, speaker, and speaking styles all
contribute to variations in speech signals, which pose one of the most
challenging issues facing ASR research. Variability in speech requires
both the signal processing and pattern modeling components of an ASR
system to adapt. The focus of this dissertation is on developing
algorithms that improve the performance of speech recognition systems
when dealing with variability in speech signals. Specifically, the focus
is on variability due to environmental noise and to accent. For
example, environmental noise contributes to significant speech
variability that depends on the type of noise and signal-to-noise ratio
(SNR). A noise robust feature extraction technique is necessary in order
for ASR to deal with noisy speech. Variations in speech signals due to
certain pronunciations can also result in degraded ASR performance. An
ASR system needs to compensate for these pronunciation variations.
In terms of noise robustness, we explore feature extraction and frame selection algorithms that can enhance the signal processing component to handle variability caused by noise. Algorithms are then tested on several bench-mark databases to compare their performance with state-of-the-art noise robust ASR systems. Improved noise robust ASR recognition accuracy is observed. In terms of speaker accent robustness, we focus on pronunciation modeling. We propose algorithms to analyze pronunciation variations for Spanish-accented speech at the pronunciation lexical level.
Since speech recognition systems rely heavily on Hidden Markov Models (HMM), a confusability measure for HMMs is important. We propose a distance measurement between HMMs which improves upon existing HMM confusability metrics in terms of ASR performance prediction, confusion pattern prediction and pronunciation modeling.
