|
|
|
Variable Frame Rate (VFR) analysis is a method of feature extraction for noise robust automatic speech recognition (ASR)
which builds on speech perception research that shows that dynamic spectro-temporal information is important, and, hence,
not all equi-duration speech segments are equally important perceptually. For example, formant transitions at the onset of a vowel
can carry more discriminative information than the steady-state part of the vowel
...
(details)
(download)
|
VoiceSauce is an application, implemented in Matlab,
which provides automated voice measurements
over time from audio recordings. Inputs are standard wave (*.wav) files and the measures currently computed are: F0, Formants F1-F4, H1(*), H2(*), H4(*), A1(*), A2(*),
A3(*), H1(*)-H2(*), H2(*)-H4(*), H1(*)-A1(*), H1(*)-A2(*), H1(*)-A3(*), Energy, and Cepstral Peak Prominence ...
(details)
|
XVocal is the UNIX version of Dr.
Shinji Maeda's Vocal Tract Articulatory
Synthesizer, VTCALCS (originally
developed for the PC platform). In 1995,
Edmond Chi Hin Chui of our
laboratory ported the PC version to
UNIX. With the permission by Dr. Maeda,
XVocal is now freely available if used
for research purposes only. Please check out
the
user manual for a detailed instruction
on how to use the program... (details)
| |
| |
Speechdemo is a Matlab-based graphical tool
for speech analysis by
Qifeng Zhu. It supports simultaneous
analysis of signals in two channels. The
user can view the signal in time and
frequency using a variety of analysis tools
such as the Discrete Fourier Transform (DFT);
Linear Predictive Coding (LPC);
Mel-Frequency Cepstral Coefficients (MFCC);
and others... (details)
| |
| |
An extensive database of 1,728 isolated
Consonants and Vowels (CV) is available
through this website. (details)
| |
| |
The speech group at Microsoft Research
(Redmond, Washington, US) and IPAM and
Electrical Engineering at UCLA (Los Angeles,
CA, US) have recently jointly developed a
database for manually labeled
vocal-tract-resonance (or formant)
trajectories, for research in speech
processing including analysis, synthesis,
and recognition.
(details)
| |
| |
A narrated videotape showing 3D tongue and
vocal tract reconstructions from MRI data
for consonants and vowels as produced by 2
talkers. Sample 3D models can be seen at:
http://www.ee.ucla.edu/~spapl/projects/mri.html.
This videotape is an effective teaching aid,
and is produced by Shrikanth Narayanan and
Abeer Alwan. ... (details)
For a free copy of the videotape, please
email Prof. Alwan at:
alwan@icsl.ucla.edu
| |
| |
|
|