Robust Automatic Recognition of Birdsongs and Human Speech: A Template Based Approach

Speaker: Kantapon Kaewtip
Affiliation: Ph.D. Candidate - UCLA

Abstract: The first part of this talk focuses on automatic birdsong-phrase detection and transcription system that is robust to limited training data, class variability, and noise. The algorithm comprises a noise-robust, Dynamic-Time-Warping (DTW)-based segmentation and a discriminative classifier for outlier rejection. The algorithm utilizes DTW and prominent (high energy) time-frequency regions of training spectrograms to derive a reliable noise-robust template for each phrase class. The resulting template is then used for segmenting continuous recordings to obtain segment candidates whose spectrogram amplitudes in the prominent regions are used as features to a Support Vector Machine (SVM). In addition, we also modified the Hidden Markov Model framework to be more robust to limited training data and noise. The proposed algorithm utilizes training data efficiently by sharing training examples.

The second part deals with noise-robust processing for automatic speech recognition. The proposed algorithm determines a time-warping function (TWF) and the speaker’s pitch with high precision, simultaneously. This technique reduces the smearing effect in between harmonics when the fundamental frequency is not constant within the analysis window. We show how this new representation can be used for automatic speech recognition by proposing a robust spectral representation derived from harmonic amplitude interpolation.

Biography: Kantapon Kaewtip received his bachelor degree in Electrical Engineering from Brown University in 2010. He joined SPAPL (Speech Processing and Auditory Perception Laboratory) in 2010. He received his Masters Degree in Electrical Engineering on the Signals and Systems track at UCLA in 2012. He worked with Nokia, Broadcom, Qualcomm, and Oben. At Qualcomm, he advanced to the final round as 1 of the top 3 finalists (out of 62 teams) on the IdeaQuest competition where he led his team members, designed the system, implemented prototypes, and gave a shark tank pitch to executive VPs. In addition, he volunteered as a sound editor for the event “Shannon Centennial: 1100100 years of bits” at UCLA. He is currently pursuing his Ph.D. degree under the supervision of Professor Abeer Alwan. His research topics include automatic recognition of birdsongs and human speech.

For more information, contact Prof. Abeer Alwan ()

Date(s) - Dec 14, 2016
1:00 pm - 3:00 pm

E-IV Faraday Room #67-124
420 Westwood Plaza - 6th Flr., Los Angeles CA 90095