|
|
|
Variable Frame Rate
(VFR) analysis is a method of feature extraction for noise robust
automatic speech recognition (ASR) which builds on speech perception
research that shows that dynamic spectro-temporal information is
important, and, hence, not all equi-duration speech segments are
equally important perceptually. For example, formant transitions at the
onset of a vowel can carry more discriminative information than the
steady-state part of the vowel ... (details)
(download)
|
VoiceSauce is an
application, implemented in Matlab, which provides automated voice
measurements over time from audio recordings. Inputs are standard wave
(*.wav) files and the measures currently computed are: F0, Formants
F1-F4, H1(*), H2(*), H4(*), A1(*), A2(*), A3(*), H1(*)-H2(*),
H2(*)-H4(*), H1(*)-A1(*), H1(*)-A2(*), H1(*)-A3(*), Energy, and
Cepstral Peak Prominence ... (details)
|
XVocal is the UNIX version of Dr. Shinji Maeda's Vocal Tract
Articulatory Synthesizer, VTCALCS (originally
developed for the PC platform). In 1995,
Edmond Chi Hin Chui of our
laboratory ported the PC version to UNIX. With the permission
by Dr. Maeda, XVocal is now freely available if
used for research purposes only. Please check out the
user manual for a detailed instruction on how to use the
program... (details)
|
|
|
|
Speechdemo is a Matlab-based graphical tool for speech analysis by Qifeng Zhu.
It supports simultaneous analysis of signals in two channels. The user
can view the signal in time and frequency using a variety of analysis
tools such as the Discrete Fourier Transform (DFT); Linear Predictive
Coding (LPC); Mel-Frequency Cepstral Coefficients (MFCC); and others...
(details)
|
|
|
|
An extensive database
of 1,728 isolated Consonants and Vowels (CV) is available through this
website. (details)
|
|
|
|
The speech group at Microsoft Research (Redmond, Washington, US) and
IPAM and Electrical Engineering at UCLA (Los Angeles, CA, US) have
recently jointly developed a database for manually labeled
vocal-tract-resonance (or formant) trajectories, for research in speech
processing including analysis, synthesis, and recognition. (details)
|
|
|
|
A narrated videotape showing 3D tongue and vocal tract reconstructions
from MRI data for consonants and vowels as produced by 2 talkers.
Sample 3D models can be seen at:
http://www.ee.ucla.edu/~spapl/projects/mri.html. This
videotape is an effective teaching aid, and is produced by Shrikanth
Narayanan and Abeer Alwan. ... (details)
For a free copy of the
videotape, please email Prof. Alwan at: alwan@icsl.ucla.edu
|
The label files in this package contain the time-stamps of silence
(sil) and short pause (sp) found in Aurora-2 test sets. These
time-stamps are obtained through a manual visual inspection of the
spectrograms of clean test files.
|
|
|
|
|
UCSC Speech
Links;
Alexander Graham Bell's Path to the Telephone;
F0 Estimation Resorces
(from the PhD dissertation of Arturo Camacho, SWIPE: A Sawtooth
Waveform Inspired Pitch Estimator for Speech and Music, 2007) ;
• AC-P:
This algorithm (Boersma, 1993) computes the autocorrelation of the
signal and divides it by the autocorrelation of the window used to
analyze the signal. It uses postprocessing to reduce discontinuities in
the pitch trace. It is available with the Praat System
at <http://www.fon.hum.uva.nl/praat> The name of the
function is ac.
• AC-S:
This algorithm uses the autocorrelation of the cubed signal. It is
available with the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs> . The name of
the function is fxac.
• ANAL:
This algorithm (Secrest and Doddington, 1983) uses
autocorrelation to estimate the pitch, and dynamic programming to
remove discontinuities in the pitch trace. It is available with
the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the
function is fxanal.
• CATE: This algorithm uses a quasi
autocorrelation function of the speech excitation signal to estimate
the pitch. We implemented it based on its original description
(Di Martino, 1999). The dynamic programming component used to remove
discontinuities in the pitch trace was not implemented.
• CC:
This algorithm uses cross-correlation to estimate the pitch and
post-processing to remove discontinuities in the pitch trace. It is
available with the Praat System at <http://www.fon.hum.uva.nl/praat>. The name of the function is cc.
• CEP:
This algorithm (Noll, 1967) uses the cepstrum of the signal and is
available with the Speech Filing System
at <http://www.phon.ucl.ac.uk/resource/sfs>. The name
of the function is fxcep.
• ESRPD:
This algorithm (Bagshaw, 1993; Medan, 1991) uses a normalized
cross-correlation to estimate the pitch, and post-processing to remove
discontinuities in the pitch trace. It is available with
the Festival Speech Filing System
at <http://www.cstr.ed.ac.uk/projects/festival>. The name of
the function is pda.
• RAPT:
This algorithm (Secrest and Doddington, 1983) uses a normalized cross-
correlation to estimate the pitch, and dynamic programming to remove
discontinuities in the pitch trace. It is available with the Speech
Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>.
The name of the function is fxrapt.
• SHS:
This algorithm (Hermes, 1988) uses subharmonic summation. It is
available with the Praat System
at <http://www.fon.hum.uva.nl/praat>. The name of the
function is shs.
• SHR:
This algorithm (Sun, 2000) uses the subharmonic-to-harmonic ratio. It
is available at Matlab
Central <http://www.mathworks.com/matlabcentral, under
the title “Pitch Determination Algorithm”>. The name of the function
is shrp.
• TEMPO:
This algorithm (Kawahara et al., 1999) uses the instantaneous frequency
of the outputs of a filterbank. It is available with the STRAIGHT
System at its author web page
<http://www.wakayama-u.ac.jp/~kawahara>. The name of the function
is exstraightsource.
• YIN:
This algorithm (de Cheveigné and Kawahara, 2002) uses a modified
version of the average squared difference function. It is
available from its author web page at
<http://www.ircam.fr/pcm/cheveign/sw/yin.zip>. The name of
the function is yin.
|
|
|
|
|
|
|