A phoneme-based approach for eliminating out-of-vocabulary problem of Turkish speech recognition using Hidden Markov Model

Yavuz E., TOPUZ V.

COMPUTER SYSTEMS SCIENCE AND ENGINEERING, vol.33, no.6, pp.429-445, 2018 (Journal Indexed in SCI) identifier

  • Publication Type: Article / Article
  • Volume: 33 Issue: 6
  • Publication Date: 2018
  • Page Numbers: pp.429-445


Since Turkish is a morphologically productive language, it is almost impossible for a word-based recognition system to be realized to completely model Turkish language. Due to the fact that it is difficult for the system to recognize words not introduced to it in a word-based recognition system, recognition success rate drops considerably caused by out-of-vocabulary words. In this study, a speaker-dependent, phoneme-based word recognition system has been designed and implemented for Turkish Language to overcome the problem. An algorithm for finding phoneme-boundaries has been devised in order to segment the word into its phonemes. After the segmentation of words into phonemes, each phoneme is separated into different sub-groups according to its position and neighboring phonemes in that word. Generated sub-groups are represented by Hidden Markov Model, which is a statistical technique, using Mel-frequency cepstral coefficients as feature vector. Since phoneme-based approach is adopted in this study, it has been successfully achieved that many out of vocabulary words could be recognized.