|
Ever since the first product launches by Honda and Pioneer in the
early 90s, automotive navigation systems have come a long way and
are now a must have accessory of undoubted utility. Whether fully
integrated in the car's dashboard or an aftermarket device, the
sales of driver assistance and navigation systems have been constantly
growing: in 2005 the Personal Navigation Devices (PND) market has
been estimated by Canalys to be about 6 million units in Europe
alone.
The appeal that these systems have had to the general public has
resulted in the launch of a large number of new products for all
pockets, especially in the aftermarket, portable device segment,
ranging from "all-in-one" PNDs, to PDAs or Smartphones
with a GPS receiver or external Bluetooth-based GPS device.
A
common function that all these systems have is the ability to give
spoken instructions to the driver, whose primary task is obviously
to look at the road and drive safely without putting himself or
others at risk. This speech interface with the navigation system
has previously consisted solely of a defined set of standard messages,
such as: "in 200 meters turn right" or "take the
second exit" and so on. These messages had to be prepared,
translated into different languages, recorded in studios and then
inserted as audio files in the navigation application. This task
must be carefully planned as recordings are expensive and speakers
not always immediately available, thus making it difficult to add
new prompts or a new language to existing applications.
The
solution to this problem is that of using a text-to-speech system,
able to generate spoken output from any written text, instead of
voice recordings. Navigation software providers had nixed this idea
in the past, as the quality and naturalness of the TTS systems existing
on the market did not meet the market needs. Thus, despite the limited
possibilities that recorded voice prompts offered, and their high
costs, navigation software providers continued to use this method,
fearing that the difference in quality would be perceived by the
end users.
Today,
as research in the speech technology sector has resulted in high
quality, lifelike synthetic voices, these problems have been superseded.
In addition, Loquendo has decided to create a specific solution
by integrating both pre-recorded voice prompts and its unit selection
concatenative TTS engine, in order to deliver an ad-hoc, high quality
product to the automotive navigation market.
In
fact, Loquendo has designed a set of unique characteristics that
make its automotive TTS-based solution more reliable, effective,
and easy to integrate and use in navigation systems. This has enabled
Loquendo Embedded TTS to become a reference solution on the market
today, and customers can appreciate the difference in the millions
of OEM and aftermarket solutions already installed worldwide.
TTS
flexibility vs. pre-recorded quality. Why choose when you can have
both!
The advantages
offered by the integration of a text-to-speech engine in such systems
are manifold, both in terms of flexibility and differentiation of
the product, as well as in economic terms.
First of all, using a TTS system allows for the expansion of
the number and type of vocalized prompts, including dynamic
and variable content, which could not be pre-recorded,
such as:
- reading of
signposts and addresses (eg. "Please take the next exit towards
Hamburg." or "In 200 meters, turn left in Madison Avenue.")
- reading of
TMC or other incoming traffic related messages
- personalization
of messages with different prosodic styles based on the user (see
expressivity and personalization)
- reading of
text messages, e-mails, e-books or any other text stored in the
navigation device or in any Bluetooth connected device (see differentiation
is key)
Regarding the
economic aspect, the use of a TTS system can drastically reduce
or completely eliminate the costs incurred for:
- recording
the vocal prompts read by professional speakers
- application
porting to new languages
- software
maintenance and upgrades
Thus
for the minimal expense of a license fee or royalty per device,
navigation software providers can eliminate all NRE costs and are
rewarded with a high ROI.
Advantages
of a TTS system tailored to the sector's needs
As
mentioned above, in response to navigation software providers' fears
that the use of TTS alone, for general navigation prompts, will
lead to a loss of quality, Loquendo has created a specific, additional
database with several hundred phrases commonly used by navigation
systems, that will thus be rendered at pre-recorded quality.
This is possible thanks to the unit selection technique used by
Loquendo TTS. This state-of-the-art TTS technique is based on the
selection and concatenation of speech fragments taken from a large
acoustic database to match the text to be spoken. This technique
shows a leap forward in the naturalness of the synthetic speech
output, especially if compared with other techniques such as diphone
concatenation or parametric TTS. Loquendo was one of the first companies
to adopt Unit Selection and also to exploit it in the embedded field.
All
these additional phrases have been recorded with both a suspensive
and a conclusive intonation, allowing them to be inserted at different
points in the navigator's vocal instruction.
Some examples are:
|
#
|
Pronunciation |
English |
|
1
|
Suspensive |
in
50 meters, |
|
2
|
Suspensive |
move
into the slip road, |
|
...
|
Suspensive |
at
the roundabout, |
|
...
|
Conclusive |
take
the third exit. |
|
|
TeleAtlasTM
and NAVTEQTM speech-enabled map compliancy
All automotive navigation systems contain a road database, since
their purpose is that of guiding the user along a route or path
to their destination. This road database is a vector map of some
area of interest, which can vary from a local region to an entire
continent or the whole world, according to the navigation system
supplier.
Street names
and house numbers, as well as point of interest (POIs) or other
specific buildings/environmental features, are encoded on these
maps as geographical coordinates, so that users can find their desired
destination by street address or POI location.
The two leading
map vendors worldwide are TeleAtlas and NAVTEQ, who are currently
working on developing a stardard map format.
Both companies have extensively worked on mapping the globe and
have catalogued the names of millions of locations, streets and
points of interest, using phonetic transcriptions based on the SAMPA
phonetic alphabet.
In
order to fully exploit the flexibility of its TTS system, and thus
use it to correctly read addresses and POIs aloud, Loquendo has
collaborated with both TeleAtlas and NAVTEQ on the correct prononciation
of addresses using their respective phonetic databases. Loquendo
Automotive Solution TTS thus supports both vendors' SAMPA phonetic
alphabets, increasing the quality of the speech rendering when reading
addresses, signposts and POIs, in any language, also leveraging
on Loquendo's mixed language capability.
Mixed language capability
Historically, as cities expanded and new streets were constructed,
the new street name was often taken from a foreign historical figure
or location, e.g. Charles De Gaulle Avenue or Arthur Schopenhauer
Street, and so on. These street names are sometimes difficult to
pronounce for non-native speakers, so just imagine what a machine
could do! This is not the case with Loquendo TTS, since research
in the field has led us to develop a feature called mixed language
capability.
Thanks
to a phonetic mapping algorithm, this feature enables each of Loquendo's
voices to speak any other language available in our portfolio. This
means that, for example, our Italian voice Luca can speak English
with an Italian accent, or a French voice speak German with a French
accent.
This unique and patented feature is extremely useful for the reading
of navigation commands and addresses, not only for the examples
cited above but also for when you travel to another country but
still wish to receive the driving instructions in your mother tongue.
Expressivity and personalization: giving character
to your user interface
Loquendo is the only speech technology provider that has introduced
expressive speech to the market, providing expressive cues that
enable users to personalize the style of the TTS rendering.
This differentiation can be done by simply modifying the text, without
the need for new recordings.
Feel the difference:
|
›
|
NORMAL:
"For the motorway, take the second exit" |
|
|
›
|
FORMAL:
"Throat clearing, Sir, for the motorway, please take the
second exit" |
|
|
›
|
INFORMAL:
"Hey John, for the motorway, take the second exit, ok?" |
|
For entertainment
purposes or to convey certain types of messages, a humorous style
can also be adopted
|
›
|
HUMOUR:
"Uh-oh you've gone the wrong way, hmm, do a u-turn as soon
as possible" or |
|
| |
"Aagh!,
heavy traffic on the motorway" |
|
This feature
is also important for enriching and enlivening the user experience
when listening to longer texts, a hot topic for differentiation
purposes.

Differentiation
is key
As
competition in the automotive navigation market tightens and the
number of products and solutions multiplies, PND and navigation
software providers are finding new ways to differentiate their products
in order to retain their competitive advantage.
More and more players in this market are beginning to realize that,
for this reason, "content" and interoperability with other
devices (i.e. mobile phones, iPods, etc.) is key.
Nevertheless,
the possibility of connecting your navigation device to your phone,
and the possibility of including more content and more descriptions
of the POIs, would remain unfeasible without a text-to-speech system
as this would pose a safety risk to the user, whose primary task
is driving. For this reason, embedding a text-to-speech system in
the navigation software becomes essential. Motorists are then able
to have their text messages or emails read out loud, or can choose
to listen to the tourist information related to the location or
POIs they are heading to.
Loquendo's extensive language coverage, continually expanding
With Loquendo TTS you can choose between 20 different languages
and over 40 different male and female voice personas, according
to your needs and preferences.
|
Language |
|
|
|
English US |
Susan
|
Dave,
Kenneth
|
|
English UK |
Kate,
Elizabeth
|
Simon
|
|
Castilian Spanish |
Carmen
|
Jorge,
Juan
|
|
Catalan |
Montserrat
|
Jordi
|
|
French |
Juliette,
Sophie
|
Bernard
|
|
German |
Katrin,
Ulrike
|
Stefan
|
| Italian |
Giulia,
Paola, Silvana, Valentina
|
Matteo,
Luca, Marcello, Roberto
|
| Greek |
Afroditi,
Artemis
|
|
|
Portuguese |
Amalia
|
Eusebio
|
|
Swedish |
Annika
|
|
|
Dutch |
Saskia
|
Willem
|
|
Brazilian Portuguese |
Gabriela
|
|
|
Mandarin Chinese |
Linlin
|
|
|
Mexican Spanish |
Esperanza
|
|
|
Chilean |
Francisca
|
|
|
Argentinean |
|
Diego
|
| American
Spanish |
|
Carlos
|
| Polish |
Zosia
|
Andrzej
|
| Canadian
French |
(4Q06)
|
|
| Turkish |
(4Q06)
|
|
| Czech |
(1Q07)
|
|
Loquendo
Embedded TTS technical features and advantages
- Footprint
flexibility: you choose your size!
Whether you choose the 8,16, 22 or 44KHz sampling rate, Loquendo
offers different CODEC bit rates and different sized databases,
to give navigation software providers the maximum flexibility
in choosing the right quality/space trade-off, according to the
system's constraints.
- Multiplatform
solution
Loquendo Embedded TTS is available with the same voices and the
same core engine on a wide range of platforms:
WindRiver VxWorks , Windows Automotive, CE.NET
4.2 and 5, Windows Mobile (PPC and Smartphone), Linux, Windows
XP Embedded, Windows XP TabletPC Edition, Symbian OS
7 Series 60.
All these versions use the same APIs (Loquendo API, JSAPI and
Microsoft SAPI) and are compliant with the SSML, IPA and SAMPA
standards.
Conclusions
We have seen that in this fast growing market of navigation and
automotive telematics, differentiation and content are becoming
more and more important to retain market share. It is also evident
that, as cities get more congested and the time spent in your car
per day increases, providing information and delivering other content
during this drive time is of interest to many businesses. Text-to-speech
can be the ideal, cost-effective solution to deliver this new and
diversified content to drivers, enabling navigation solution providers
to increase their return on investment and differentiate their product
simply by integrating the TTS feature into their devices.
Some early movers have already taken this step, but other suppliers
will surely follow this trend shortly. A new generation of TTS-enabled
devices will have a strong impact on the market, providing a new
lifelike experience, and TomTom, with its GO 910, will be
a reference point for any user.
Acronyms
API Application
Programming Interface
IPA International
Phonetic Alphabet
JSAPI Java Speech API
NRE Non-Recurring
Engineering
OEM Original Equipment
Manufacturer
PDA Personal Digital
Assistant
PND Personal Navigation
Device
POI Point
Of Interest
SAMPA Speech Assessment Methods Phonetic Alphabet
SAPI Speech API
TMC Traffic Message
Channel
TTS Text-To-Speech
|