Loquendo Logo

Understanding the importance of speech in in-car navigation solutions:
LOQUENDO AUTOMOTIVE SOLUTION

Authors: Luisa Cordano, Silvio Nasi


Ever since the first product launches by Honda and Pioneer in the early 90s, automotive navigation systems have come a long way and are now a must have accessory of undoubted utility. Whether fully integrated in the car's dashboard or an aftermarket device, the sales of driver assistance and navigation systems have been constantly growing: in 2005 the Personal Navigation Devices (PND) market has been estimated by Canalys to be about 6 million units in Europe alone.
The appeal that these systems have had to the general public has resulted in the launch of a large number of new products for all pockets, especially in the aftermarket, portable device segment, ranging from "all-in-one" PNDs, to PDAs or Smartphones with a GPS receiver or external Bluetooth-based GPS device.

A common function that all these systems have is the ability to give spoken instructions to the driver, whose primary task is obviously to look at the road and drive safely without putting himself or others at risk. This speech interface with the navigation system has previously consisted solely of a defined set of standard messages, such as: "in 200 meters turn right" or "take the second exit" and so on. These messages had to be prepared, translated into different languages, recorded in studios and then inserted as audio files in the navigation application. This task must be carefully planned as recordings are expensive and speakers not always immediately available, thus making it difficult to add new prompts or a new language to existing applications.

The solution to this problem is that of using a text-to-speech system, able to generate spoken output from any written text, instead of voice recordings. Navigation software providers had nixed this idea in the past, as the quality and naturalness of the TTS systems existing on the market did not meet the market needs. Thus, despite the limited possibilities that recorded voice prompts offered, and their high costs, navigation software providers continued to use this method, fearing that the difference in quality would be perceived by the end users.

Today, as research in the speech technology sector has resulted in high quality, lifelike synthetic voices, these problems have been superseded. In addition, Loquendo has decided to create a specific solution by integrating both pre-recorded voice prompts and its unit selection concatenative TTS engine, in order to deliver an ad-hoc, high quality product to the automotive navigation market.

In fact, Loquendo has designed a set of unique characteristics that make its automotive TTS-based solution more reliable, effective, and easy to integrate and use in navigation systems. This has enabled Loquendo Embedded TTS to become a reference solution on the market today, and customers can appreciate the difference in the millions of OEM and aftermarket solutions already installed worldwide.

TTS flexibility vs. pre-recorded quality. Why choose when you can have both!
The advantages offered by the integration of a text-to-speech engine in such systems are manifold, both in terms of flexibility and differentiation of the product, as well as in economic terms.
First of all, using a TTS system allows for the expansion of the number and type of vocalized prompts, including dynamic and variable content, which could not be pre-recorded, such as:

  • reading of signposts and addresses (eg. "Please take the next exit towards Hamburg." or "In 200 meters, turn left in Madison Avenue.")
  • reading of TMC or other incoming traffic related messages
  • personalization of messages with different prosodic styles based on the user (see expressivity and personalization)
  • reading of text messages, e-mails, e-books or any other text stored in the navigation device or in any Bluetooth connected device (see differentiation is key)

Regarding the economic aspect, the use of a TTS system can drastically reduce or completely eliminate the costs incurred for:

  • recording the vocal prompts read by professional speakers
  • application porting to new languages
  • software maintenance and upgrades

Thus for the minimal expense of a license fee or royalty per device, navigation software providers can eliminate all NRE costs and are rewarded with a high ROI.

Advantages of a TTS system tailored to the sector's needs
As mentioned above, in response to navigation software providers' fears that the use of TTS alone, for general navigation prompts, will lead to a loss of quality, Loquendo has created a specific, additional database with several hundred phrases commonly used by navigation systems, that will thus be rendered at pre-recorded quality.
This is possible thanks to the unit selection technique used by Loquendo TTS. This state-of-the-art TTS technique is based on the selection and concatenation of speech fragments taken from a large acoustic database to match the text to be spoken. This technique shows a leap forward in the naturalness of the synthetic speech output, especially if compared with other techniques such as diphone concatenation or parametric TTS. Loquendo was one of the first companies to adopt Unit Selection and also to exploit it in the embedded field.

All these additional phrases have been recorded with both a suspensive and a conclusive intonation, allowing them to be inserted at different points in the navigator's vocal instruction.


Some examples are:

#
Pronunciation English
1
Suspensive in 50 meters,
2
Suspensive move into the slip road,
...
Suspensive at the roundabout,
...
Conclusive take the third exit.
Listen to Audio 1

TeleAtlasTM and NAVTEQTM speech-enabled map compliancy
All automotive navigation systems contain a road database, since their purpose is that of guiding the user along a route or path to their destination. This road database is a vector map of some area of interest, which can vary from a local region to an entire continent or the whole world, according to the navigation system supplier.

Street names and house numbers, as well as point of interest (POIs) or other specific buildings/environmental features, are encoded on these maps as geographical coordinates, so that users can find their desired destination by street address or POI location.

The two leading map vendors worldwide are TeleAtlas and NAVTEQ, who are currently working on developing a stardard map format.
Both companies have extensively worked on mapping the globe and have catalogued the names of millions of locations, streets and points of interest, using phonetic transcriptions based on the SAMPA phonetic alphabet.

In order to fully exploit the flexibility of its TTS system, and thus use it to correctly read addresses and POIs aloud, Loquendo has collaborated with both TeleAtlas and NAVTEQ on the correct prononciation of addresses using their respective phonetic databases. Loquendo Automotive Solution TTS thus supports both vendors' SAMPA phonetic alphabets, increasing the quality of the speech rendering when reading addresses, signposts and POIs, in any language, also leveraging on Loquendo's mixed language capability.

Mixed language capability™
Historically, as cities expanded and new streets were constructed, the new street name was often taken from a foreign historical figure or location, e.g. Charles De Gaulle Avenue or Arthur Schopenhauer Street, and so on. These street names are sometimes difficult to pronounce for non-native speakers, so just imagine what a machine could do! This is not the case with Loquendo TTS, since research in the field has led us to develop a feature called mixed language capability.

Thanks to a phonetic mapping algorithm, this feature enables each of Loquendo's voices to speak any other language available in our portfolio. This means that, for example, our Italian voice Luca can speak English with an Italian accent, or a French voice speak German with a French accent.
This unique and patented feature is extremely useful for the reading of navigation commands and addresses, not only for the examples cited above but also for when you travel to another country but still wish to receive the driving instructions in your mother tongue.


Expressivity and personalization: giving character to your user interface
Loquendo is the only speech technology provider that has introduced expressive speech to the market, providing expressive cues that enable users to personalize the style of the TTS rendering.
This differentiation can be done by simply modifying the text, without the need for new recordings.
Feel the difference:

NORMAL: "For the motorway, take the second exit"
Listen to Audio 3
FORMAL: "Throat clearing, Sir, for the motorway, please take the second exit"
Listen to Audio 4
INFORMAL: "Hey John, for the motorway, take the second exit, ok?"
Listen to Audio 5

For entertainment purposes or to convey certain types of messages, a humorous style can also be adopted

HUMOUR: "Uh-oh you've gone the wrong way, hmm, do a u-turn as soon as possible" or
Listen to Audio 6
  "Aagh!, heavy traffic on the motorway"
Listen to Audio 7

This feature is also important for enriching and enlivening the user experience when listening to longer texts, a hot topic for differentiation purposes.

Back to the Top

Differentiation is key
As competition in the automotive navigation market tightens and the number of products and solutions multiplies, PND and navigation software providers are finding new ways to differentiate their products in order to retain their competitive advantage.
More and more players in this market are beginning to realize that, for this reason, "content" and interoperability with other devices (i.e. mobile phones, iPods, etc.) is key.

Nevertheless, the possibility of connecting your navigation device to your phone, and the possibility of including more content and more descriptions of the POIs, would remain unfeasible without a text-to-speech system as this would pose a safety risk to the user, whose primary task is driving. For this reason, embedding a text-to-speech system in the navigation software becomes essential. Motorists are then able to have their text messages or emails read out loud, or can choose to listen to the tourist information related to the location or POIs they are heading to.


Loquendo's extensive language coverage, continually expanding

With Loquendo TTS you can choose between 20 different languages and over 40 different male and female voice personas, according to your needs and preferences.

Language
Female
Male
English US
Susan
Dave, Kenneth
English UK
Kate, Elizabeth
Simon
Castilian Spanish
Carmen
Jorge, Juan
Catalan
Montserrat
Jordi
French
Juliette, Sophie
Bernard
German
Katrin, Ulrike
Stefan
Italian
Giulia, Paola, Silvana, Valentina
Matteo, Luca, Marcello, Roberto
Greek
Afroditi, Artemis
Portuguese
Amalia
Eusebio
Swedish
Annika
Dutch
Saskia
Willem
Brazilian Portuguese
Gabriela
Mandarin Chinese
Linlin
Mexican Spanish
Esperanza
Chilean
Francisca
Argentinean
Diego
American Spanish
Carlos
Polish
Zosia
Andrzej
Canadian French
(4Q06)
Turkish
(4Q06)
Czech
(1Q07)

Back to the Top

Loquendo Embedded TTS technical features and advantages

  • Footprint flexibility: you choose your size!
    Whether you choose the 8,16, 22 or 44KHz sampling rate, Loquendo offers different CODEC bit rates and different sized databases, to give navigation software providers the maximum flexibility in choosing the right quality/space trade-off, according to the system's constraints.

  • Multiplatform solution
    Loquendo Embedded TTS is available with the same voices and the same core engine on a wide range of platforms:
    WindRiver VxWorks™ , Windows Automotive™, CE.NET™ 4.2 and 5, Windows Mobile™ (PPC and Smartphone), Linux, Windows™ XP Embedded, Windows™ XP TabletPC Edition, Symbian OS™ 7 Series 60.
    All these versions use the same APIs (Loquendo API, JSAPI and Microsoft SAPI) and are compliant with the SSML, IPA and SAMPA standards.


Conclusions

We have seen that in this fast growing market of navigation and automotive telematics, differentiation and content are becoming more and more important to retain market share. It is also evident that, as cities get more congested and the time spent in your car per day increases, providing information and delivering other content during this drive time is of interest to many businesses. Text-to-speech can be the ideal, cost-effective solution to deliver this new and diversified content to drivers, enabling navigation solution providers to increase their return on investment and differentiate their product simply by integrating the TTS feature into their devices.
Some early movers have already taken this step, but other suppliers will surely follow this trend shortly. A new generation of TTS-enabled devices will have a strong impact on the market, providing a new lifelike experience, and TomTom™, with its GO 910, will be a reference point for any user.


Acronyms

API          Application Programming Interface
IPA          International Phonetic Alphabet
JSAPI      Java Speech API
NRE        Non-Recurring Engineering
OEM       Original Equipment Manufacturer
PDA        Personal Digital Assistant
PND        Personal Navigation Device
POI         Point Of Interest
SAMPA   Speech Assessment Methods Phonetic Alphabet
SAPI       Speech API
TMC        Traffic Message Channel
TTS         Text-To-Speech