speech recognizer; word accuracy; channel model; post-correction
We have implemented a post-processor called SPEECHPP to correct word-level errors committed by an arbitrary speech recognizer. Applying a noisy-channel model, SPEECHPP uses a Viterbi beam-search that employs language and channel models. Previous work demonstrated that a simple word-for-word channel model was sufficient to yield substantial incieases in word accuracy. This paper demonstrates that some improvements in word accuracy result from augmenting the channel model with an account of word fertility in the channel. This work further demonstrates that a modern continuous speech recognizer can be used in "black-box" fashion for robustly recognizing speech for which the recognizer was not originally trained. This work also demonstrates that in the case when the recognizer can be tuned to the new task, environment, or speaker, the post-processor can also contribute to performance improvements.
(c) 1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.;