Social media analysis by innovative hybrid algorithms with label propagation


ALTINEL GİRGİN A. B.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.210, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 210
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1016/j.eswa.2022.118606
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
  • Anahtar Kelimeler: Label propagation algorithm, Social media analysis, Topic -based tweet classification, Sentiment polarity detection
  • Marmara Üniversitesi Adresli: Evet

Özet

Due to the huge size of the data accumulated on microblogging sites, recently, two fundamental questions have become very popular: 1) What percentage of this accumulated data has positive or negative sentiment polarity? 2) How is the distribution of this accumulated data on different topics? Inspired by these motivated necessities, this paper presents several different algorithms which are based on the Label Propagation Algorithm (LPA) in order to handle previously mentioned two fundamentals tasks: sentiment polarity detection task and topic-based text classification task. These algorithms are the Label Propagated-Relevance Frequency Classifier (LP-RFC) and LP-Abstract Frequency Classifier (LP-AFC). These algorithms can be defined as new semantic smoothing classi-fiers, which take advantage of the semantic connections among terms in the label propagation phase of the LPA. Additionally, another classifier, namely LP-ComRFC+AFC, was built. LP-ComRFC+AFC is actually a weighted sum-mation classifier of the individual LP-RFC and LP-AFC. Furthermore, considering the shortage of labeled data in real-world scenarios, a semi-supervised version of LP-RFC and LP-AFC, namely ???Merging Unlabeled and Labeled Instances with Semantic Values of Terms??? (MULIS), was designed and implemented. For the experiments of the sentiment polarity detection task, three different datasets were use and for the experiments of topic-based text classification task, a self-collected tweet dataset was use. According to the experimental results, the suggested algorithms, and their composite form, LP-ComRFC+AFC, generated higher F1 scores than all of the baseline al-gorithms at nearly all of the training splits on the datasets.