Adaptation of Acoustic and Language Model for Improving Arabic Automatic Speech Recognition

  • Oussama Soliman Enshassi -----> Prof. Alaa El-Halees

Automatic Speech Recognition (ASR) is translation of spoken words into text by
computer. ASR technology has been widely integrated into many systems. However,
Arabic speech recognition applications still suffer from high error rate, which is mainly
due to a variation in speech. Variation in speech leads to a mismatch between the Arabic
speech and the trained models.
Variation in speech is a major problem in improving the accuracy of Arabic
automatic continuous speech recognition applications. Variability may occur at the
phonetic, word, or sentence level. In this thesis, the researcher proposes an approach to
adapt acoustic model and language model under limited resource for Arabic speakers. A
preliminary work on pronunciation model has also been carried out.
Arabic acoustic modeling has been proposed to overcome the variation in speech
under limited resource for Arabic speakers. In our case, if there are several Arabic
acoustic models available, we can propose a hybrid approach of interpolation and
merging of acoustic model for adapting the target acoustic model. The proposed
approaches have proven to be very effective to handle the variability existing in the
Arabic speech. The Word Error Rate (WER) was measured for both systems. It was
found that the baseline system has the WER equals 13.28% which was significantly
decreased to 11.04% in the Enhanced system.
Besides, the researcher proposed interpolation approach for adapting the Arabic
language model. The results showed that the baseline system has the WER equals 12.4%
which significantly declined to 8.4% in the Enhanced system. In addition, the results
showed that applying the hybrid of acoustic approach followed by interpolation language
approach achieved considerable improvement of 5.32% in the WER. The baseline
system has the WER equals 13.28% which was significantly reduced to 7.96% in the
Enhanced system.
However, the proposed phonetic rules in pronunciation model did not lead to a
significant improvement.