Leveraging Higher Order Dependencies between Features for Text Classification

Ganiz M. C., Lytkin N. I., Pottenger W. M.

Joint European Conference on Machine Learning (ECML)/European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Bled, Slovenya, 7 - 11 Eylül 2009, cilt.5781, ss.375-377

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 5781
Doi Numarası: 10.1007/978-3-642-04180-8_42
Basıldığı Şehir: Bled
Basıldığı Ülke: Slovenya
Sayfa Sayıları: ss.375-377
Anahtar Kelimeler: machine learning, text classification, higher order learning, statistical relational learning, higher order naive bayes, higher order support vector machine
Marmara Üniversitesi Adresli: Hayır

Özet

Traditional machine learning methods only consider relationships between feature values within individual data, instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for Classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of theses higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.