Prediction of COVID-19 disease severity using synthetic data oversampling and machine learning methods on data at first hospitalization İlk yatıştaki veriler üzerinde yapay veri çoğaltma ve makine öğrenmesi yöntemleri kullanılarak COVID-19 hastalık şiddetinin tahmini

Köksal, Kübra; DOĞAN, BUKET; ALTIKARDEŞ, ZEHRA

doi:10.17341/gazimmfd.1348341

Prediction of COVID-19 disease severity using synthetic data oversampling and machine learning methods on data at first hospitalization İlk yatıştaki veriler üzerinde yapay veri çoğaltma ve makine öğrenmesi yöntemleri kullanılarak COVID-19 hastalık şiddetinin tahmini

Journal of the Faculty of Engineering and Architecture of Gazi University, cilt.40, sa.1, ss.413-427, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 40 Sayı: 1
Basım Tarihi: 2024
Doi Numarası: 10.17341/gazimmfd.1348341
Dergi Adı: Journal of the Faculty of Engineering and Architecture of Gazi University
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Art Source, Compendex, TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.413-427
Anahtar Kelimeler: COVID-19, laboratory data, machine learning, prognosis
Marmara Üniversitesi Adresli: Evet

Özet

COVID-19, originating in Wuhan, China, in December 2019 declared a pandemic by the World Health Organization on March 11, 2020, rapidly spread worldwide, significantly impacting human life and the health sector. This study aims to develop a WHO (World Health Organization) oriented disease severity prediction model using laboratory and demographic data from COVID-19 patients upon admission to Marmara University Hospital. The relationship between oxygen and intensive care needs with laboratory results on the data set was analyzed using K-nearest neighbor, Bagging, Random Forest and Decision Tree machine learning methods. The dataset's unbalanced class distribution was balanced using the SMOTE method, and the impact of data multiplication on classification performance was evaluated. In the data set without SMOTE, the patient's oxygen requirement during the first hospitalization was estimated with 16 features at 91.67% accuracy, the oxygen requirement at hospitalization with 18 features at 91.96%, and the intensive care need at hospitalization with 12 features at 92.17% accuracy. After SMOTE data multiplication, an increase of 6%, 24% and 21% was observed in F1-Score values, respectively. This study significantly contributes to the field by utilizing machine learning methods on patient data, essential for COVID-19 diagnosis, monitoring, and clinical management through required laboratory tests.