A phoneme-based approach for eliminating out-of-vocabulary problem of Turkish speech recognition using Hidden Markov Model


Yavuz E., TOPUZ V.

COMPUTER SYSTEMS SCIENCE AND ENGINEERING, cilt.33, sa.6, ss.429-445, 2018 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 33 Sayı: 6
  • Basım Tarihi: 2018
  • Dergi Adı: COMPUTER SYSTEMS SCIENCE AND ENGINEERING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.429-445
  • Anahtar Kelimeler: Speech recognition, Hidden Markov model, cepstral analysis, phoneme boundary
  • Marmara Üniversitesi Adresli: Evet

Özet

Since Turkish is a morphologically productive language, it is almost impossible for a word-based recognition system to be realized to completely model Turkish language. Due to the fact that it is difficult for the system to recognize words not introduced to it in a word-based recognition system, recognition success rate drops considerably caused by out-of-vocabulary words. In this study, a speaker-dependent, phoneme-based word recognition system has been designed and implemented for Turkish Language to overcome the problem. An algorithm for finding phoneme-boundaries has been devised in order to segment the word into its phonemes. After the segmentation of words into phonemes, each phoneme is separated into different sub-groups according to its position and neighboring phonemes in that word. Generated sub-groups are represented by Hidden Markov Model, which is a statistical technique, using Mel-frequency cepstral coefficients as feature vector. Since phoneme-based approach is adopted in this study, it has been successfully achieved that many out of vocabulary words could be recognized.