speech-dispatcher voice configuration problems with festival

Question

I finally got Festival working with the US HTS voices: cmu_us_awb_cg, cmu_us_jmk_cg, cmu_us_slt_cg, cmu_us_bdl_cg, cmu_us_clb_cg, cmu_us_rms_cg.

I manually configured festival.scm to use bdl voice:

 (set! voice_default 'voice_cmu_us_bdl_cg)

It's now working fine both from within interactive festival and when server is running (festival --server):

nc localhost 1314 <<< "(tts_text \"Hello big world, this is a test.\" nil)(quit)"

I then configured speech-dispatcher, it failed to properly configure itself via spd-conf, but I manually fixed the configuration filespeechd. To sum it up:

LogLevel  3
LogDir  "default"
DefaultRate  5
DefaultVolume 100    
DefaultLanguage "en"
DefaultPunctuationMode "all"
AudioOutputMethod "alsa"
AudioALSADevice "default"
AddModule "festival"     "sd_festival"  "festival.conf"
AddModule "dummy"         "sd_dummy"      ""
DefaultModule festival
LanguageDefaultModule "en"  "festival"
Include "clients/*.conf"

Now ALSA test is working fine (producing sound). However, when I send a text to speech-dispatcher:

spd-say "Hello big world, this is a test."

...the festival server goes crazy, like it was unsuccessfully trying each and every voice it can think of:

SIOD: unknown voice cmu_us_ahw_cg
SIOD: unknown voice cmu_us_ahw_cg
SIOD: unknown voice cmu_us_aup_cg
SIOD: unknown voice cmu_us_aup_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_axb_cg
SIOD: unknown voice cmu_us_axb_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD ERROR: could not open file /usr/share/festival/dicts/oald/oaldlex.scm
closing a file left open: /usr/share/festival/voices/english/rab_diphone/festvox/rab_diphone.scm
SIOD: unknown voice rab_diphone
SIOD ERROR: could not open file /usr/share/festival/dicts/oald/oaldlex.scm
closing a file left open: /usr/share/festival/voices/english/rab_diphone/festvox/rab_diphone.scm
SIOD: unknown voice rab_diphone
SIOD: unknown voice cmu_us_kal_com_hts
SIOD: unknown voice cmu_us_kal_com_hts
SIOD: unknown voice cstr_us_ked_timit_hts
SIOD: unknown voice cstr_us_ked_timit_hts
SIOD: unknown voice cmu_us_slt_cg
SIOD: unknown voice cmu_us_slt_cg
SIOD: unknown voice cmu_us_rms_cg
SIOD: unknown voice cmu_us_rms_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD ERROR: ran out of storage 
closing a file left open: /usr/share/festival/voices/us/cmu_us_clb_cg//rf_models/trees_08/cmu_us_clb_mcep.tree
SIOD: unknown voice cmu_us_clb_cg
SIOD ERROR: ran out of storage 
closing a file left open: /usr/share/festival/voices/us/cmu_us_clb_cg//festival/trees/cmu_us_clb_mcep.tree
SIOD: unknown voice cmu_us_clb_cg
client(10) Mon Mar 16 22:10:26 2020 : accepted from localhost
SIOD: unknown voice cmu_us_ahw_cg
SIOD: unknown voice cmu_us_aup_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_axb_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD ERROR: could not open file /usr/share/festival/dicts/oald/oaldlex.scm
closing a file left open: /usr/share/festival/voices/english/rab_diphone/festvox/rab_diphone.scm
SIOD: unknown voice rab_diphone
SIOD: unknown voice cmu_us_kal_com_hts
SIOD: unknown voice cstr_us_ked_timit_hts
SIOD: unknown voice cmu_us_slt_cg
SIOD: unknown voice cmu_us_rms_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_bdl_cg
SIOD ERROR: ran out of storage 
closing a file left open: /usr/share/festival/voices/us/cmu_us_clb_cg//rf_models/trees_08/cmu_us_clb_mcep.tree
SIOD: unknown voice cmu_us_clb_cg
SIOD ERROR: ran out of storage 
closing a file left open: /usr/share/festival/voices/us/cmu_us_jmk_cg//festival/trees/cmu_us_jmk_mcep.tree
SIOD: unknown voice cmu_us_jmk_cg
SIOD: unknown voice cmu_us_ahw_cg
SIOD: unknown voice cmu_us_ahw_cg
SIOD: unknown voice cmu_us_aup_cg
SIOD: unknown voice cmu_us_aup_cg
SIOD: unknown voice cmu_us_awb_cg
SIOD: unknown voice cmu_us_awb_cg

So, festival is working, connection to ALSA is working, speech-dispatcher is sending something to the festival, but it's somehow broken, possibly wrong voice settings.

There is also a configuration file for festival module within /etc/speech-dispatcher/modules/ folder, festival.conf, but it's virtually empty (with a lot of commented text) and it does not mention anything about voices set by speech-dispatcher when calling the Festival. It's a place I would assume one can set that, especially because a comment in speechd.conf:

The DefaultVoiceType controls which voice type should be used by default. Voice types are symbolic names which map to particular voices provided by the synthesizer according to the output module configuration. Please see the synthesizer-specific configuration in etc/speech-dispatcher/modules/ to see which voices are assigned to different symbolic names. The following symbolic names are currently supported: MALE1, MALE2, MALE3, FEMALE1, FEMALE2, FEMALE3, CHILD_MALE, CHILD_FEMALE

# DefaultVoiceType "MALE1"

I also tried to increase heap size up to 50M (as per some posts in other discussions), but it doesn't help:

festival --server --heap 50000000

I get the same strange errors. Any suggestions appreciated.

score 2 · Answer 1 · edited Mar 12 '21 at 08:49

To solve this issue need to define (proclaim_voice in scm file. please refer below steps:

go to festival/voices folder

edit scm file

open:

vim us/cmu_us_clb_arctic_clunits/festvox/cmu_us_clb_arctic_clunits.scm   #<--voice clunits.scm

Go to bottom of file add line before (provide 'cmu_us_clb_arctic_clunits)

Add below content (update language, gender, dialect as needed). Update, save and quit.

(proclaim_voice
 'cmu_us_clb_arctic_clunits
 '((language english)
   (gender female)
   (dialect american)
   (description
    "This voice provides an American English male voice using a
     residual excited LPC diphone synthesis method.  It uses
     the CMU Lexicon pronunciations.  Prosodic phrasing is provided
     by a statistically trained model using part of speech and local
     distribution of breaks.  Intonation is provided by a CART tree
     predicting ToBI accents and an F0 contour generated from a model
     trained from natural speech.  The duration model is also trained
     from data using a CART tree.")))

After /usr/bin/festival --server the error should be gone

You could also run spd-say -L to show the details.

If need to update default festival voice:

Edit /etc/festival.scm to add

(set! voice_default 'voice_<You prefered Voice name>)

restart festival server or reboot ubuntu. The default voice will be changed to new one via spd-say command or from Firefox.

Thank you Heng, looks like a solution, I'll test (within a week or so) it and report back. I was recently working with a different component that did not have a TTS solution for Linux, so we agreed with the devolper that I'd write a little "speech-dispatcher" for MaryTTS. I did that, but the developer seem to cease working on the project... Anyhow, I'll try to follow your steps to make the original component work with spd and Festival as you suggested. — Oak_3260548, Mar 12 '21 at 11:25
what is ``? I have no idea what to put there. Can you please give examples for English? — chovy, Nov 12 '21 at 04:38

cronulis · Answer 2 · 2020-11-13T21:32:04.917

-1

The problem could be because speech-dispatcher is not accepting festival's default voice, instead, it tries to use its own settings.

Try uncommenting and changing the DefaultVoiceType to something like:

DefaultVoiceType "FEMALE1"

I'd also do some testing using different programs, like Firefox's reader mode (ALT+CTRL+R) and see if you get any of the listed voices working.

edited Nov 13 '20 at 21:32

answered Nov 13 '20 at 14:36

cronulis

1
2

Thank you for your suggestion, I'll have a look at it once again and report back. – Oak_3260548 Nov 13 '20 at 15:30
what file do you add this to? – chovy Nov 12 '21 at 04:37

speech-dispatcher voice configuration problems with festival

2 Answers2