|
Loquendo
Acoustic Model Adaptation (AMA) is able
to adapt acoustic models using data collected
in the field, i.e. when the application is running.
The general scheme for Loquendo learning and
adaptation from data collected in the field
is outlined in Fig. 1. The diagram also features
the Phonetic
Learning tool and LogCollector
tool. The Loquendo AMA uses the LogCollector
tool to gather the adaptation data, and can
be used in conjunction with the Phonetic Learning
tool to improve the performance of the general
acoustic model.
The acoustic model adaptation
(and phonetic learning) architecture

As depicted in Fig.1, the global scheme is based
on a three-step process:
(1) Data acquisition,
(2) Data analysis, and
(3) Adapted model activation.
For data acquisition, the user
must acquire new data from the recognizers running
on some recognition servers. With Loquendo ASR,
it is possible to do this by activating the
dumping of the Log data. In this way, the information
necessary for adaptation is saved during the
running of the application, e.g. the speech
signal, phonetic transcription, etc.
Subsequently, data analysis
must be carried out using Loquendo ASR installed
on an off-line workstation. The LogCollector
tool allows the selection and gathering of the
Log data and makes it available to Loquendo
AMA, which produces the adapted acoustic
model. The adaptation takes place in unsupervised
mode, where no human input is required to transcribe
the collected material.
The material needed for adaptation
requires some further comment:
1. Quantity of vocal material
required may vary from a few minutes
(for speaker adaptation) to a few hours. In
general, the larger the amount of material,
the better the results will be. It must be noted
however, that adaptation time will increase
accordingly.
2. Kind of vocal material
required. Users are asked to bear in
mind that Loquendo AMA adapts to everything
that is present in the adaptation material,
and only to what is present. Thus, if you want
to preserve speaker independence, collect material
from many different speakers.
However, Loquendo AMA incorporates an innovative,
patented technology that avoids damaging the
general performances of the adapted models even
when the adaptation material is too limited
and polarized in terms of lexicon.
The final adapted model activation can
be carried out by exporting the adapted model
from the off-line workstation and importing
it into the recognition servers; the application
is then configured/re-run to switch from the
general acoustic model to the adapted model.
The final model shows increased recognition
performance, without any additional computational
charge.
Following the adaptation phase,
the application developer should validate the
proposed adapted model using the Loquendo ASR
Evaluation Tool Kit. This tool is an application-based
software console developed for Microsoft Windows
2000/XP and Linux operating systems.
In difficult cases, when the
performance of the general models is not high
enough (less than 70% WA), you may wish to try
using the "Supervised Loquendo AMA",
an auxiliary tool not released in the standard
distribution but provided only on request. To
use the "Supervised Loquendo AMA"
some vocal material has to be collected and
transcribed manually by a human operator. Please
contact Loquendo for more details.
|