Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition

Speaker: Jinxi Guo
Affiliation: UCLA Ph.D. Candidate

Abstract:  Deep learning and artificial neural network research has grown significantly over the past decade, especially in the fields of automatic speech recognition (ASR) and speaker recognition (SV). Compared to traditional methods, deep learning-based approaches are more powerful in learning representation from data and building complex models. Therefore, this talk will focus on representation learning and modeling using neural network-based approaches for speech and speaker recognition.

In the first part of the talk, I will present two novel neural-network based methods to learn speaker-specific and phoneme-invariant representation for short-utterance speaker verification. Both approaches improve speaker verification performance significantly. In the second part of the talk, I will propose several neural-network architectures which take raw speech features (either complex Discrete Fourier Transform (DFT) features or raw waveforms) as input, and perform the feature extraction and phone classification jointly. The unified neural network models provide significantly lower ASR error rate compared with traditional models. In the third part of the talk, I will discuss novel neural networks for sequence modeling. I will first talk about attention mechanisms for acoustic sequence modeling. Then, a sequence-to-sequence based spelling correction model for end-to-end ASR will be presented.

Biography:  Jinxi Guo is a PhD candidate in the Electrical and Computer Engineering Department at UCLA. His research interests include automatic speech and speaker recognition, machine learning and deep learning. For his doctoral research, he is working on developing novel neural network based architectures and approaches for representation learning and modeling, with applications to automatic speech and speaker recognition. He has 19 top conference and journal papers published (including 12 first-author papers). He has held research internships at Google Research, Amazon Alexa Machine Learning, Snapchat AI Research and Qualcomm Research. He received Borgstrom Fellowship from UCLA for 2014-2015 and the Dissertation Year Fellowship from UCLA for 2018-2019. He received the B.E. degree in Electrical and Information Engineering from Xi’an Jiaotong University, China, in 2013, and the M.S. degree in Electrical Engineering from UCLA, in 2015.

 For more information, contact Prof. Abeer Alwan ()

Date(s) - May 24, 2019
12:00 pm - 2:00 pm

E-IV Tesla Room #53-125
420 Westwood Plaza - 5th Flr., Los Angeles CA 90095