Leveraging Higher Order Dependencies between Features for Text Classification


Ganiz M. C. , Lytkin N. I. , Pottenger W. M.

Joint European Conference on Machine Learning (ECML)/European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Bled, Slovenya, 7 - 11 Eylül 2009, cilt.5781, ss.375-377 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 5781
  • Doi Numarası: 10.1007/978-3-642-04180-8_42
  • Basıldığı Şehir: Bled
  • Basıldığı Ülke: Slovenya
  • Sayfa Sayıları: ss.375-377

Özet

Traditional machine learning methods only consider relationships between feature values within individual data, instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for Classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of theses higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.