|
Traditional TTS systems are
conceived to read texts in a single language.
Multilingual texts can be read correctly
by switching voices at every language change.
However, this approach does not provide the
best results when dealing with truly mixed-language
text - where changes occur frequently and are
embedded in sentences and phrases, as is the
case of Web content, e-mail or information services,
where foreign names and phrases (e.g. film titles)
occur frequently.
The optimal solution is to have a single TTS
voice capable of correctly reading an entire
mixed-language text.
This can be achieved through
two different approaches:
- Producing multilingual vocal databases created
with bilingual or multilingual speakers capable
of reading several languages with mother
tongue quality. For example, Loquendo's
Jorge (Castilian) and Jordi (Catalan) in the
Interactive
TTS Demo section
- Applying the phonetic transcriber
to the foreign language text and then mapping
the transcribed phonemes onto those of
the voice's native language in order to access
its acoustic units. In order to do this, the
voice's acoustic database with the phonetic
sounds belonging to its foreign language
While the first approach makes
it possible to obtain "perfect pronunciation",
it is however restricted to a small number of
languages for a signle voice, whereas the second
provides an approximate pronounciation,
which is applicable to any foreign language.
This doesn't mean that the first method is better
than the second; on the contrary, in many real-life
situations, the approximate approach is the
most realistic. For instance, it is easier to
understand an Italian film title pronounced
by an English voice capable of correctly pronouning
Italian rather than a mother-tongue Italian
voice.
Loquendo uses both approaches
to deal with mixed-languages text.
Pronouncing foreign words
using the same voice with the Foreign Language
approach: phonetic mapping enables
a single synthetic voice to read text in foreign
languages keeping its mother-tongue pronounciation,
just like humans do. In fact, a speaker has
to pronounce foreign words in a text written
predominantly in his or her own language will
be generally inclined to pronounce these words
in a manner that may differ - also significantly
- from the correct pronunciation of the same
words when included in a complete text in the
corresponding foreign language. The approximation
of this kind of pronunciation is especially
due to the speaker choice of maintaining his
native-tongue phonological system. This choice
is due to co-articulation, economy of effort
and also to psychosocial factors, as adopting
the correct pronunciation may be regarded as
an undue sophistication and, as such, rejected
in common usage.
|