Thanks to Prof. Abeer Alwan's
guidance and support, I received my PhD
degree in Electrical Engineering
at the end of 2011.
My gratitude also goes to Prof. Daniel Blumstein, Mihaela van
der Schaar, and Kung Yao for being on my doctoral committee. My dissertation "Noise
Robust Signal Processing for
Human Pitch Tracking and Bird
Song Classification and
Detection" and
defense
slides are available for
download (*) (**).
I
am working on speech processing,
speech recognition,
and language identification as a
speech scientist at
Voci Technologies. If you would like to
initiate a discussion with me,
please feel free to drop a line
to me:
weichu@ucla.edu.
(*) The statistical algorithm
for F0 estimation (SAFE) toolkit
is available for download from
here! Welcome to test it!
(**) The Rocky Mountain
Biological Laboratory Robin song
(RMBL-Robin) database is
available for download from
here! Feedbacks are
welcomed!
• W. Chu and A. Alwan, “SAFE:
a statistical approach to F0
estimation under clean and noisy
conditions,” IEEE Trans. on
Audio, Speech, and Language Processing, Vol. 20, No. 3, pp. 933-967, 2012.
[slides]
[toolkit]
[toolkit's
tutorial]
• W. Chu and A. Alwan, “fbEM:
a filter bank EM algorithm for
the joint optimization of
features and acoustic model
parameters in bird call
classification,” Interspeech 2012,
pp. 1993-1996. [poster]
• W. Chu and D.T.
Blumstein, “Noise
robust bird song detection using
syllable pattern-based hidden
Markov models,” ICASSP 2011,
pp. 345-348. [poster]
[database]
• W. Chu and A. Alwan,
"SAFE:
a statistical algorithm for F0
estimation for both clean and
noisy speech," Interspeech
2010, pp. 2590-2593. [slides]
[toolkit]
[toolkit's
tutorial]
• W. Chu and A. Alwan, “A
correlation-maximization
denoising filter used as an
enhancement frontend for noise
robust bird call classification,”
InterSpeech 2009, pp. 2831-2834.
[slides]
[database]
• W. Chu and A. Alwan, "Reducing
F0 frame error of F0 tracking
algorithms under noisy
conditions with an
unvoiced/voiced classification
frontend," ICASSP 2009,
pp.3969-3972. [slides]
• W. Chu and J. Liu, "Using
Confidence Measures to Evaluate
the Speaker Turns in Speaker
Segmentation," Proc of Intl
Conf on Information Sciences,
Signal Processing and its
Application (ISSPA07).
•
W Chu and J. Liu, "Subband
Energy Distance Measure Applied
in Multi-Pass Speech/Non-Speech
Discrimination," Proc of
Intl Conf on Information
Sciences, Signal Processing and
its Application (ISSPA07).
• W. Chu, X. Xiao, J.
Liu, "Confidence
Score Based Unsupervised
Incremental Adaptation for OOV
Words Detection," Proc of
Intl Workshops on Statistical
Techniques in Pattern
Recognition (SSSPR06),
pp.723-731.
• Voci Technologies 01/2012 - present
Speech Scientist
– Speech processing,
speech recognition, and language
recognition.
• Speech Processing and
Audio Perception Lab, UCLA
09/2007 - 12/2011
Research
Assistant, Advisor: Prof.
Abeer Alwan –
Noise robust F0 estimation and
tracking
* Proposed SAFE - a Statistical Algorithm for F0 Estimation
under both clean and noisy
condition. The statistical
framework is promising in
modeling the effect of the noise
on Prominent SNR Peaks in the
spectra given F0. Working on
incorporating statistical
modeling of F0 transition into
SAFE to deliver an F0 tracker.
* Proposed an error metric called F0 Frame Error which is a
combination of Gross Pitch Error
and Voice Decision Error to
compare the performance of F0
tracking algorithms in a unified
framework. Used a
statistical-based
voiced/unvoiced classification
frontend to reduce Voice
Decision Errors under noisy
conditions.
–
Bird song classification,
recognition, and detection
* Extended the EM algorithm to jointly estimate optimal
center frequencies and
bandwidths of the filter bank in
cepstral feature extraction, and
model parameters in bird call
classification. Proposed an
extended auxiliary function in
which feature extraction and
model parameters are updated
iteratively and alternatively.
* Used hierarchical clustering analysis to infer bird
syllable patterns for finer
acoustic modeling. Compared to
using one single general pattern
for all syllables, both of the
precision and recall rates of
the syllable pattern-based HMM
bird song detector are
increased. The algorithm is
being transplanted onto a
hand-held device.
* Proposed a correlation-maximization denoising filter for
reducing the non-periodic noise
in the bird calls which have
periodic structure. Compared to
the Wiener filter, features
extracted from the output of the
proposed filter resulted in a
lower bird call classification
error rate.
• Speech Group, Disney
Research, Pittsburgh 06/2010
- 09/2010 Summer Intern,
Mentor: Dr.
John McDonough and Prof.
Bhiksha Raj – Used
microphone array processing and
speech recognition technologies
to build an interactively
storytelling demo for children.
Understood Acoustic Echo
Cancellation and Weighted Finite
State Transducer-based speech
recognition. Learned how to
collect, annotate, and maintain
an audio-visual children speech
database..
• Speech
Lab, Rosetta Stone 06/2009 -
08/2009 Summer Intern,
Mentor: Dr.
Bryan Pellom and Dr. Kadri
Hacioglu – Developed
statistical-based methods for
deciding the pronunciation of a
word. Understood the rule-based
and maximum entropy
criterion-based modelling
techniques used in Machine
Translation and applied them in
the Letter-To-Sound conversion.
Wrote an A* search routine in
C++.
• Speech Group,
Mitsubishi Electric Research Lab
06/2008 - 09/2008 Summer
Intern, Mentor: Prof.
Bhiksha Raj – Developed a
discriminative training module
(lattice-based MMI) on Sphinx
speech recognizer. Also explored
how initial model parameters can
affect the final model
parameters in an iterative
learning process. Understood the
Maximum Likelihood estimation,
the Baum-Welch algorithm, and
the Extended Baum-Welch
algorithm.
• Speech
Group, Microsoft Research Asia,
Beijing 04/2007 - 08/2007
Summer Intern, Mentor: Dr.
Chao Huang – Built a demo
for detecting acoustic events
(speech, music, ring tone,
background noise) in an office
environment. Compared the
effectiveness of noise robust
features, Gaussian mixture model
and hidden Markov model, MAP and
MLLR unsupervised adaptations.
Learned how to manage job queues
on computing clusters.
•
Microprocessor Tech Lab,
Intel China Research Center,
Beijing 07/2006 - 10/2006
Research Intern, Mentor: Dr. Wei
Hu – Built a demo for
locating and tracking the voice
of actors and actresses in TV
series and movies. Used Bayesian
Information Criterion to
unsupervisedly segment and
cluster speakers in the audio
stream.
• Tsinghua
University, Beijing 09/2004
- 07/2007 Research Assistant,
Advisor: Prof.
Jia Liu –
Master thesis work:
implemented a real-time
Speech-To-Text system with a
non-speech input rejection
frontend on chip. Developed a
non-speech removal frontend for
national '863' and '242' keyword
spotting evaluation.
•
UFIDA Software Corp., Beijing
02/2004 - 06/2004 Software
Intern, Supervisor: Mr. Yu Zhu
– Bachelor thesis work: created
the index of the digital map for
an on-vehicle GPS software
system..
 |
|
last updated: June 21th, 2012. |
|
|
|
The
Statistical
Algorithm
for
F0
Estimation (SAFE) toolkit is
available for download from
here! Welcome to test it!
The Rocky Mountain
Biological Laboratory Robin song (RMBL-Robin)
database is available for download from
here! Feedbacks are welcomed!
Xin Chen (now with University of Missouri,
Columbia)
Dian Gong (now with University of Southern
California)
Yiting Liao (now with
Intel)
Chanwoo Kim (now
with Microsoft)
Chao Qin (now with University of California,
Merced)
Long Qin (now with Carnegie Mellon
University)
Marek Vondrak (now with Brown University)
Xin Yan (now with Pennsylvania State
University)
Qiao Yu (now with China Academy of Science,
Shenzhen, China)
Yue Zhao (now with University of California,
Los Angeles)
Xiaodan Zhuang (now with
BBN)
• Speech Processing Tutorial:
-
Dan
Ellis's
ICSI Speech FAQ
• Hidden Markov Model
- Tutorial:
* L. Rabiner "A
tutorial on hidden Markov models and selected
applications in speech recognition,"
Proceedings of the IEEE, 77 (2), pp.
257–286, February 1989.
- Toolkit:
*
HTK BOOK 3.4
(for my own use)
* Mark
Hasegawa-Johnson's
Speech Mini Course and
HTK lecture video
• Gaussian Mixture Model
- Tutorial: Reynolds, Douglas A.,
Quatieri, Thomas F., and Dunn, Robert B., "Speaker
verification using adapted Gaussian mixture
models," Digital Signal Processing,
Vol. 10, No. 1-3, pp. 19-41, January 2000.
- Toolkit: My GMM classifier written in C
(available for download soon)
• Support Vector Machine
- Tutorial:
SVM on wiki
- Toolkit:
LIBSVM
• Large-Margin Training
-
Fei Sha,
"Large
margin training of acoustic models for speech
recognition,", PhD Thesis, 2007
-
Hui Jiang,
"Large
margin hidden Markov models for speech
recognition",
IEEE Trans. On
Audio, Speech and Language Processing,
pp.1584-1595, Vol. 14, No. 5, September 2006.
•
F0 Tracking or Pitch Detection
Algorithm
- Study:
* L. Rabiner, M. Cheng, A. Rosenberg, and C. McGonegal, "A
comparative performance study of several pitch
detection algorithms," IEEE Trans. on
Acoustics, Speech, and Signal Processing, vol.
24, no. 5, pp. 399–418, 1976.
- Toolkits:
*
SAFE
(a statistical algorithm for F0 estimation)
* Praat (has
visualization function)
*
ESPS get_f0: D. Talkin, "Robust algorithm
for pitch tracking," Speech Coding and
Synthesis, pp. 497–518, 1995.
Wavesurfer
(use ESPS get_f0, with visualization function)
*
TEMPO (a part of STRAIGHT toolkit):
*
YIN
(for the pitch of music)
*
WWB (a multi-pitch tracker)
|
|
|