Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification

Ganiz, MURAT; George, Cibin; Pottenger, William

doi:10.1109/tkde.2010.160

Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification

Atıf İçin Kopyala

Ganiz M. C., George C., Pottenger W. M.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, cilt.23, sa.7, ss.1022-1034, 2011 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 23 Sayı: 7
Basım Tarihi: 2011
Doi Numarası: 10.1109/tkde.2010.160
Dergi Adı: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.1022-1034
Anahtar Kelimeler: Machine learning, statistical relational learning, naive bayes, text classification, IID
Marmara Üniversitesi Adresli: Hayır

Özet

The underlying assumption in traditional machine learning algorithms is that instances are Independent and Identically Distributed (IID). These critical independence assumptions made in traditional machine learning algorithms prevent them from going beyond instance boundaries to exploit latent relations between features. In this paper, we develop a general approach to supervised learning by leveraging higher order dependencies between features. We introduce a novel Bayesian framework for classification termed Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages higher order relations between features across different instances. The approach is validated in the classification domain on widely used benchmark data sets. Results obtained on several benchmark text corpora demonstrate that higher order approaches achieve significant improvements in classification accuracy over the baseline methods, especially when training data is scarce. A complexity analysis also reveals that the space and time complexity of HONB compare favorably with existing approaches.