Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification

Poyraz, Mitat; Kilimci, Zeynep; Ganiz, MURAT

doi:10.1007/s11390-014-1437-6

Higher-Order Smoothing: A Novel Semantic Smoothing Method for Text Classification

Poyraz M., Kilimci Z. H., Ganiz M. C.

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, cilt.29, sa.3, ss.376-391, 2014 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 29 Sayı: 3
Basım Tarihi: 2014
Doi Numarası: 10.1007/s11390-014-1437-6
Dergi Adı: JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.376-391
Marmara Üniversitesi Adresli: Hayır

Özet

It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher-order relations in LSI capture "latent semantics". These findings have inspired a novel Bayesian framework for classification named Higher-Order Naive Bayes (HONB), which was introduced previously, that can explicitly make use of these higher-order relations. In this paper, we present a novel semantic smoothing method named Higher-Order Smoothing (HOS) for the Naive Bayes algorithm. HOS is built on a similar graph based data representation of the HONB which allows semantics in higher-order paths to be exploited. We take the concept one step further in HOS and exploit the relationships between instances of different classes. As a result, we move beyond not only instance boundaries, but also class boundaries to exploit the latent information in higher-order paths. This approach improves the parameter estimation when dealing with insufficient labeled data. Results of our extensive experiments demonstrate the value of HOS on several benchmark datasets.