Evaluation of Classification Models for Language Processing

International Symposium on Innovations in Intelligent SysTems and Applications (INISTA 2015), Madrid, İspanya, 2 - 04 Eylül 2015, ss.454-461

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/inista.2015.7276787
Basıldığı Şehir: Madrid
Basıldığı Ülke: İspanya
Sayfa Sayıları: ss.454-461
Anahtar Kelimeler: Naive Bayes, event models, smoothing methods, text categorization, language processing, PROBABILITIES
Marmara Üniversitesi Adresli: Evet

Özet

Naive Bayes is a commonly used algorithm in text categorization because of its easy implementation and low complexity. Naive Bayes has mainly two event models used for text categorization which are multivariate Bernoulli and multinomial models. A very large number of studies choose multinomial model and Laplace smoothing just based on the assumption that it performs better than multivariate model under almost any conditions. This study aims to shed some light into this widely adopted assumption by analyzing Naive Bayes event models and smoothing methods from a different perspective. To clarify the difference between events models of Naive Bayes, their classification performance are compared on different languages - English and Turkish-datasets. Results of our extensive experiments demonstrate that superior performance of multinomial model does not observed all the time. On the other hand, multivariate Bernoulli model can perform well when combined with an appropriate smoothing method under different training data size conditions.