Semi-Supervised Learning using Higher-Order Co-occurrence Paths to Overcome the Complexity of Data Representation


GANİZ M. C.

IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Macaristan, 9 - 12 Ekim 2016, ss.2242-2247 identifier

  • Basıldığı Şehir: Budapest
  • Basıldığı Ülke: Macaristan
  • Sayfa Sayıları: ss.2242-2247

Özet

We present a novel approach to semi-supervised learning for text classification based on the higher-order co-occurrence paths of words. We name the proposed method as Semi-Supervised Semantic Higher-Order Smoothing (S3HOS). The S3HOS is built on a tri-partite graph based data representation of labeled and unlabeled documents that allows semantics in higher order co-occurrence paths between terms (words) to be exploited. There are several graph-based techniques proposed in the literature to diffuse class labels from labeled documents to the unlabeled documents. In this study we propose a different and natural way of estimating class conditional probabilities for the terms in unlabeled documents without need to label the documents first. The proposed approach allows estimating class conditional probabilities for the terms in unlabeled documents and improve the estimation of terms in the labeled documents at the same time. We experimentally show that S3HOS can highly improve the parameter estimation and hence increase the classification accuracy particularly when the amount of the labeled data is scarce but unlabeled data is plentiful.