Prediction of COVID-19 disease severity using synthetic data oversampling and machine learning methods on data at first hospitalization İlk yatıştaki veriler üzerinde yapay veri çoğaltma ve makine öğrenmesi yöntemleri kullanılarak COVID-19 hastalık şiddetinin tahmini


Creative Commons License

Köksal K., DOĞAN B., ALTIKARDEŞ Z. A.

Journal of the Faculty of Engineering and Architecture of Gazi University, cilt.40, sa.1, ss.413-427, 2024 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 40 Sayı: 1
  • Basım Tarihi: 2024
  • Doi Numarası: 10.17341/gazimmfd.1348341
  • Dergi Adı: Journal of the Faculty of Engineering and Architecture of Gazi University
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Art Source, Compendex, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.413-427
  • Anahtar Kelimeler: COVID-19, laboratory data, machine learning, prognosis
  • Marmara Üniversitesi Adresli: Evet

Özet

COVID-19, originating in Wuhan, China, in December 2019 declared a pandemic by the World Health Organization on March 11, 2020, rapidly spread worldwide, significantly impacting human life and the health sector. This study aims to develop a WHO (World Health Organization) oriented disease severity prediction model using laboratory and demographic data from COVID-19 patients upon admission to Marmara University Hospital. The relationship between oxygen and intensive care needs with laboratory results on the data set was analyzed using K-nearest neighbor, Bagging, Random Forest and Decision Tree machine learning methods. The dataset's unbalanced class distribution was balanced using the SMOTE method, and the impact of data multiplication on classification performance was evaluated. In the data set without SMOTE, the patient's oxygen requirement during the first hospitalization was estimated with 16 features at 91.67% accuracy, the oxygen requirement at hospitalization with 18 features at 91.96%, and the intensive care need at hospitalization with 12 features at 92.17% accuracy. After SMOTE data multiplication, an increase of 6%, 24% and 21% was observed in F1-Score values, respectively. This study significantly contributes to the field by utilizing machine learning methods on patient data, essential for COVID-19 diagnosis, monitoring, and clinical management through required laboratory tests.