22 February 2006
The Tone-Group F0 Selection Model
This page contains the source code and samples of an F0 generation model based on tonal unit selection. The proposed model is being described in an article submitted for publication. More details will be posted here on a regular basis.
Below, there are some re-synthesis samples when using different sampling strategies as described in the paper. Also, the effect of scalable vs. linear interpolation of null syllables is demonstrated. These examples are re-synthesis of the 12 test utterance and not of the 484 utterances M-PIRO corpus (however, the objective results described in the paper have been extracted from the M-PIRO corpus), in order to fit in this page.
1: sme strategy applied to both haAb and null syllables. This simulates the optimum performance of the start-mid-end sampling followed by common LR models in Festival.
2: 0Hz threshold strategy above haAb syllables and the linear interpolation between null syllables boundaries. This represents perceptual errors when following the proposed haAb pattern encoding with no care for the null elements.
3: same as (2) but with the simplistic scalable interpolation on the null syllables to correct perceptual performance above non-perceptual elements (i.e. null syllables)
4 and 5: as 0Hz threshold generates a long series of samples over the haAb syllables, the 2Hz threshold generates approx. 5 samples over each syllable and fits better small-footprint specifications.
Natural Voice Carrier
Diphone Voice (mbrola)
A. Comparing to Linear Regression
B. Rendering focus prominence
Download the source code (for FLITE)
gxflite_tone_group.c (revision 0.6)
Contact the author
Last update: 17January 2006