|
|
|
| Glottaltopograph
(GTG) analyze tool: a toolkit to analyze high-speed laryngeal
videos. |
Glottaltopography
is a method to analyze high-speed laryngeal videos. The
method is described in this paper: Gang
Chen, Jody Kreiman, and Abeer Alwan, "The
Glottaltopograph: A Method of Analyzing High-Speed Images of the Vocal
Folds", ICASSP 2012,
pp.3985-3988. Briefly, the
"glottaltopogram" is based on principal component analysis of
pixels' light-intensity time sequences from consecutive video images.
This
method reveals the overall synchronization of the vibrational patterns
of the
vocal folds over the entire laryngeal area. This method is effective in
visualizing pathological and
normal vocal fold vibratory patterns. The GTG toolkit is available for
download here.
|
| Harmfreq_MOLRT: a
statistical model, likelihood ratio test (LRT)-based speech/non-speech
detection algorithm |
Harmfreq_MOLRT
is a statistical model, likelihood ratio test (LRT)-based
speech/non-speech detection algorithm. The likelihood ratios (LRs) for
voiced and unvoiced frames are computed differently: LR for voiced
frames is calculated using only the harmonic DFTs; for unvoiced frames,
LR is calculated using all DFTs. It is an improved version of the
multiple observation (MO) LRT VAD proposed by Ramirez et. al. [Matlab
code of Harmfreq_MOLRT VAD]
|
| MBSC: a
Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm
for noisy speech |
MBSC
is a Multi-Band Summary Correlogram (MBSC)-based pitch detection
algorithm for noisy speech. The package contains the matlab code
that is used to generate the pitch detection results reported
in L. N. Tan, and A. Alwan, "Multi-Band Summary Correlogram-based
Pitch Detection for Noisy Speech", Speech Communication, in press.
A fast version of the code is also provided in the package. [Matlab
code of MBSC pitch detector]
|
| SAFE: a Statistical Algorithm for F0 Estimation |
|
|
|
| VFR:
Variable Frame Rate |
↑Top |
Variable Frame Rate
(VFR) analysis is a method of feature extraction for noise robust
automatic speech recognition (ASR) which builds on speech perception
research that shows that dynamic spectro-temporal information is
important, and, hence, not all equi-duration speech segments are
equally important perceptually. For example, formant transitions at the
onset of a vowel can carry more discriminative information than the
steady-state part of the vowel ... (details)
(download)
|
VoiceSauce is an
application, implemented in Matlab, which provides automated voice
measurements over time from audio recordings. Inputs are standard wave
(*.wav) files and the measures currently computed are: F0, Formants
F1-F4, H1(*), H2(*), H4(*), A1(*), A2(*), A3(*), H1(*)-H2(*),
H2(*)-H4(*), H1(*)-A1(*), H1(*)-A2(*), H1(*)-A3(*), Energy, and
Cepstral Peak Prominence ... (details)
|
XVocal is the UNIX version of Dr. Shinji Maeda's Vocal Tract
Articulatory Synthesizer, VTCALCS (originally
developed for the PC platform). In 1995,
Edmond Chi Hin Chui of our
laboratory ported the PC version to UNIX. With the permission
by Dr. Maeda, XVocal is now freely available if
used for research purposes only. Please check out the
user manual for a detailed instruction on how to use the
program... (details)
|
|
|
|
Speechdemo is a Matlab-based graphical tool for speech analysis by Qifeng Zhu.
It supports simultaneous analysis of signals in two channels. The user
can view the signal in time and frequency using a variety of analysis
tools such as the Discrete Fourier Transform (DFT); Linear Predictive
Coding (LPC); Mel-Frequency Cepstral Coefficients (MFCC); and others...
(details)
|
|
|
|
An extensive database
of 1,728 isolated Consonants and Vowels (CV) is available through this
website. (details)
|
|
|
|
The speech group at Microsoft Research (Redmond, Washington, US) and
IPAM and Electrical Engineering at UCLA (Los Angeles, CA, US) have
recently jointly developed a database for manually labeled
vocal-tract-resonance (or formant) trajectories, for research in speech
processing including analysis, synthesis, and recognition. (details)
|
|
|
|
A narrated videotape showing 3D tongue and vocal tract reconstructions
from MRI data for consonants and vowels as produced by 2 talkers.
Sample 3D models can be seen at:
http://www.ee.ucla.edu/~spapl/projects/mri.html. This
videotape is an effective teaching aid, and is produced by Shrikanth
Narayanan and Abeer Alwan. ... (details)
For a free copy of the
videotape, please email Prof. Alwan at: alwan@icsl.ucla.edu
|
The label files in this package contain the time-stamps of silence
(sil) and short pause (sp) found in Aurora-2 test sets. These
time-stamps are obtained through a manual visual inspection of the
spectrograms of clean test files.
|
|
|
|
|
UCSC Speech
Links;
Alexander
Graham Bell's Path to the Telephone;
F0
Estimation Resorces
(from the PhD dissertation of Arturo Camacho, SWIPE: A Sawtooth
Waveform Inspired Pitch Estimator for Speech and Music, 2007) ;
• AC-P:
This algorithm (Boersma, 1993) computes the autocorrelation of the
signal and divides it by the autocorrelation of the window used to
analyze the signal. It uses postprocessing to reduce discontinuities in
the pitch trace. It is available with the Praat System
at <http://www.fon.hum.uva.nl/praat> The name of the
function is ac.
• AC-S:
This algorithm uses the autocorrelation of the cubed signal. It is
available with the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs> . The name of
the function is fxac.
• ANAL:
This algorithm (Secrest and Doddington, 1983) uses
autocorrelation to estimate the pitch, and dynamic programming to
remove discontinuities in the pitch trace. It is available with
the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the
function is fxanal.
• CATE: This algorithm uses a quasi
autocorrelation function of the speech excitation signal to estimate
the pitch. We implemented it based on its original description
(Di Martino, 1999). The dynamic programming component used to remove
discontinuities in the pitch trace was not implemented.
• CC:
This algorithm uses cross-correlation to estimate the pitch and
post-processing to remove discontinuities in the pitch trace. It is
available with the Praat System at
<http://www.fon.hum.uva.nl/praat>. The name of the function is
cc.
• CEP:
This algorithm (Noll, 1967) uses the cepstrum of the signal and is
available with the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs>. The name
of the function is fxcep.
• ESRPD:
This algorithm (Bagshaw, 1993; Medan, 1991) uses a normalized
cross-correlation to estimate the pitch, and post-processing to remove
discontinuities in the pitch trace. It is available with
the Festival Speech Filing System
at <http://www.cstr.ed.ac.uk/projects/festival>. The name of
the function is pda.
• RAPT:
This algorithm (Secrest and Doddington, 1983) uses a normalized cross-
correlation to estimate the pitch, and dynamic programming to remove
discontinuities in the pitch trace. It is available with the Speech
Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>.
The name of the function is fxrapt.
• SHS:
This algorithm (Hermes, 1988) uses subharmonic summation. It is
available with the Praat System
at <http://www.fon.hum.uva.nl/praat>. The name of the
function is shs.
• SHR:
This algorithm (Sun, 2000) uses the subharmonic-to-harmonic ratio. It
is available at Matlab
Central <http://www.mathworks.com/matlabcentral, under
the title “Pitch Determination Algorithm”>. The name of
the function
is shrp.
• TEMPO:
This algorithm (Kawahara et al., 1999) uses the instantaneous frequency
of the outputs of a filterbank. It is available with the STRAIGHT
System at its author web page
<http://www.wakayama-u.ac.jp/~kawahara>. The name of the function
is exstraightsource.
• YIN:
This algorithm (de Cheveigné and Kawahara, 2002) uses a modified
version of the average squared difference function. It is
available from its author web page at
<http://www.ircam.fr/pcm/cheveign/sw/yin.zip>. The name of
the function is yin.
|
|
|
|
|
|
|