Ομάδα Ομιλίας
Gerasimos
Xydas

Main
Publications
TtS resources
DEMOSTHeNES
FESTIVAL
FLITE
Projects
Student projects
Documents
Music
Links


Gerasimos Xydas
gxydas@di.uoa.gr
Last update:
22 February 2006

Valid HTML 4.01!

The Tone-Group F0 Selection Model

Introduction

This page contains the source code and samples of an F0 generation model based on tonal unit selection. The proposed model is being described in an article submitted for publication. More details will be posted here on a regular basis.


Re-synthesis

Below, there are some re-synthesis samples when using different sampling strategies as described in the paper. Also, the effect of scalable vs. linear interpolation of null syllables is demonstrated. These examples are re-synthesis of the 12 test utterance and not of the 484 utterances M-PIRO corpus (however, the objective results described in the paper have been extracted from the M-PIRO corpus), in order to fit in this page.

1
2
3
4
5
Original haAb: sme
null: sme
haAb: 0Hz THD
null: linear
haAb: 0Hz THD
null: scalable
haAb: 2Hz THD
null: linear
haAb: 2Hz THD
null: scalable
original_01 sme_01 0hz_lnr_01 0hz_scl_01 2hz_lnr_01 2hz_scl_01
original_02 sme_02 0hz_lnr_02 0hz_scl_02 2hz_lnr_02 2hz_scl_02
original_03 sme_03 0hz_lnr_03 0hz_scl_03 2hz_lnr_03 2hz_scl_03
original_04 sme_04 0hz_lnr_04 0hz_scl_04 2hz_lnr_04 2hz_scl_04
original_05 sme_05 0hz_lnr_05 0hz_scl_05 2hz_lnr_05 2hz_scl_05
original_06 sme_06 0hz_lnr_06 0hz_scl_06 2hz_lnr_06 2hz_scl_06
original_07 sme_07 0hz_lnr_07 0hz_scl_07 2hz_lnr_07 2hz_scl_07
original_08 sme_08 0hz_lnr_08 0hz_scl_08 2hz_lnr_08 2hz_scl_08
original_09 sme_09 0hz_lnr_09 0hz_scl_09 2hz_lnr_09 2hz_scl_09
original_10 sme_10 0hz_lnr_10 0hz_scl_10 2hz_lnr_10 2hz_scl_10
original_11 sme_11 0hz_lnr_11 0hz_scl_11 2hz_lnr_11 2hz_scl_11
original_12 sme_12 0hz_lnr_12 0hz_scl_12 2hz_lnr_12 2hz_scl_12

Details:

1: sme strategy applied to both haAb and null syllables. This simulates the optimum performance of the start-mid-end sampling followed by common LR models in Festival.

2: 0Hz threshold strategy above haAb syllables and the linear interpolation between null syllables boundaries. This represents perceptual errors when following the proposed haAb pattern encoding with no care for the null elements.

3: same as (2) but with the simplistic scalable interpolation on the null syllables to correct perceptual performance above non-perceptual elements (i.e. null syllables)

4 and 5: as 0Hz threshold generates a long series of samples over the haAb syllables, the 2Hz threshold generates approx. 5 samples over each syllable and fits better small-footprint specifications.

Configurations

Natural Voice Carrier

Diphone Voice (mbrola)


Listening tests

A. Comparing to Linear Regression

# Natural voice carrier Diphone voice (mbrola)
LR TGS LR TGS
1 fratz_lr_01 fratz_tgs_01 mbrgr2_lr_01 mbrgr2_tgs_01
2 fratz_lr_02 fratz_tgs_02 mbrgr2_lr_02 mbrgr2_tgs_02
3 fratz_lr_03 fratz_tgs_03 mbrgr2_lr_03 mbrgr2_tgs_03
4 fratz_lr_04 fratz_tgs_04 mbrgr2_lr_04 mbrgr2_tgs_04
5 fratz_lr_05 fratz_tgs_05 mbrgr2_lr_05 mbrgr2_tgs_05
6 fratz_lr_06 fratz_tgs_06 mbrgr2_lr_06 mbrgr2_tgs_06
7 fratz_lr_07 fratz_tgs_07 mbrgr2_lr_07 mbrgr2_tgs_07
8 fratz_lr_08 fratz_tgs_08 mbrgr2_lr_08 mbrgr2_tgs_08
9 fratz_lr_09 fratz_tgs_09 mbrgr2_lr_09 mbrgr2_tgs_09
10 fratz_lr_10 fratz_tgs_10 mbrgr2_lr_10 mbrgr2_tgs_10
11 fratz_lr_11 fratz_tgs_11 mbrgr2_lr_11 mbrgr2_tgs_11
12 fratz_lr_12 fratz_tgs_12 mbrgr2_lr_12 mbrgr2_tgs_12

B. Rendering focus prominence

# Natural voice carrier Diphone voice (mbrola)
1 fratz_foc_01 mbrgr2_foc_01
2 fratz_foc_02 mbrgr2_foc_02
3 fratz_foc_03 mbrgr2_foc_03
4 fratz_foc_04 mbrgr2_foc_04
5 fratz_foc_05 mbrgr2_foc_05
6 fratz_foc_06 mbrgr2_foc_06
7 fratz_foc_07 mbrgr2_foc_07
8 fratz_foc_08 mbrgr2_foc_08
9 fratz_foc_09 mbrgr2_foc_09
10 fratz_foc_10 mbrgr2_foc_10
11 fratz_foc_11 mbrgr2_foc_11
12 fratz_foc_12 mbrgr2_foc_12

Download the source code (for FLITE)

gxflite_tone_group.c (revision 0.6)


Contact the author

Gerasimos Xydas (gxydas@di.uoa.gr)

Last update: 17January 2006