Predicting the Soft Error Vulnerability of Parallel Applications Using Machine Learning


Oz I., Arslan S.

INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, cilt.49, sa.3, ss.410-439, 2021 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 49 Sayı: 3
  • Basım Tarihi: 2021
  • Doi Numarası: 10.1007/s10766-021-00707-0
  • Dergi Adı: INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, MLA - Modern Language Association Database, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.410-439
  • Anahtar Kelimeler: Fault injection, Machine Learning, Parallel programming, Soft error analysis
  • Marmara Üniversitesi Adresli: Evet

Özet

With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error rates of the applications. However, it is very time consuming to perform detailed fault injection experiments. Therefore, prediction-based techniques have been proposed to evaluate the soft error vulnerability in a faster way. In this work, we present a soft error vulnerability prediction approach for parallel applications using machine learning algorithms. We define a set of features including thread communication, data sharing, parallel programming, and performance characteristics; and train our models based on three ML algorithms. This study uses the parallel programming features, as well as the combination of all features for the first time in vulnerability prediction of parallel programs. We propose two models for the soft error vulnerability prediction: (1) A regression model with rigorous feature selection analysis that estimates correct execution rates, (2) A novel classification model that predicts the vulnerability level of the target programs. We get maximum prediction accuracy rate of 73.2% for the regression-based model, and achieve 89% F-score for our classification model.