A speaker dependent, large vocabulary, isolated word speech recognition system for Turkish

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Marmara Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2005

Tezin Dili: İngilizce

Öğrenci: VOLKAN TUNALI

Danışman: MURAT DOĞRUEL

Özet:

Sayısal sinyal işleme teknolojisindeki gelişmeler, sinyal işlemenin ses sıkıştırma, geliştirme, sentezleme ve tanıma gibi çok değişik ve çeşitli alanlarda kullanımına yol açmıştır. Bu tez kapsamında, konuşma tanıma problemi ele alınmış ve Türkçe için konuşmacı bağımlı, geniş sözcük dağarcıklı, ayrık kelime konuşma tanıma sistemi geliştirilmiştir. Projede, konuşma tanıma problemine iki genel yaklaşımın bir birleşimi kullanılmıştır: akustik-fonetik yaklaşım ve stokastik yaklaşım. Tanımadaki en küçük birim olarak iki durumlu Saklı Markov Modelleriyle (SMM) modellenmiş fonemler kullanılmıştır. Ses sinyalinden özellik vektörü çıkarım yöntemi olarak Mel-frekansı Kepstral Katsayılar (MFKK) tercih edilmiştir. Sistemin eğitimi aşamasında kullanılmak üzere fonem tespiti ve kesimlemesi için yeni bir algoritma geliştirilmiştir. Fonem tabanlı tanıma kullanılarak, sistemde eğitilmemiş sözcüklerin de tanınabilmesi sağlanmıştır. Anahtar sözcükler: Türkçe konuşma tanıma, fonem tabanlı konuşma tanıma, Saklı Markov Modelleri (SMM), Mel-frekansı Kepstral Katsayılar (MFKK), ses özellik vektörü. The advances in digital signal processing technology has led the use of speech processing in many different application areas like speech compression, enhancement, synthesis, and recognition. In this thesis, the issue of speech recognition was studied and a speaker dependent, large vocabulary, isolated word speech recognition system was developed for Turkish Language. A combination of two common approaches to speech recognition problem was used in the project: Acoustic-phonetic approach and stochastic approach. The phonemes modeled by two-state Hidden Markov Models (HMM) were used as the smallest unit for recognition. Mel-Frequency Cepstral Coefficients (MFCC) was preferred as the feature vector extraction method. A new algorithm was devised for phoneme detection and segmentation used in the training stage. Using phoneme-based recognition, the words that are not trained can be recognized by the system. Keywords: Turkish speech recognition, phoneme based speech recognition, Hidden Markov Model (HMM), Mel-Frequency Cepstral Coefficients (MFCC), speech feature vector.