Evaluation of risk factors and survival rates of patients with early-stage breast cancer with machine learning and traditional methods


ÖZGÜR E. G., Ulgen A., Uzun S., BEKİROĞLU G. N.

International Journal of Medical Informatics, cilt.190, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 190
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.ijmedinf.2024.105548
  • Dergi Adı: International Journal of Medical Informatics
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, BIOSIS, Biotechnology Research Abstracts, CAB Abstracts, CINAHL, Compendex, EMBASE, INSPEC, MEDLINE
  • Anahtar Kelimeler: Breast cancer, Early stage, Machine learning, Risk factor
  • Marmara Üniversitesi Adresli: Evet

Özet

Background: This article is aimed to make predictions in terms of prognostic factors and compare prediction methods by using Cox proportional hazards regression analysis (CPH), some machine learning techniques and Accelerated Failure Time (AFT) model for post-treatment survival probabilities according to clinical presentations and pathological information of early-stage breast cancer patients. Material and methods: The study was carried out in three stages. In the first stage, the CPH method was applied. In the second stage, the AFT model and in the last stage, machine learning methods were applied. The data set consists of 697 breast cancer patients who applied to Marmara University Hospital oncology clinic between 01.01.1994 and 31.12.2009. The models obtained by using various parameters of the patients were compared according to the C index, 5-year survival rate and 10-year survival rate. Results and conclusion: According to the models obtained as a result of the analyses applied, MetLN and age were obtained as a significant risk factor as a result of CPH method and AFT methods, while MetLN, age, tumor size, LV1 and extracapsular involvement were obtained as risk factors in machine learning methods. In addition, when the c-index values of the handheld models are examined, it is obtained as 69.8 for the CPH model, 70.36 for the AFT model, 72.1 for the random survival forest and 72.8 for the gradient boosting machine. In conclusion, the study highlights the potential of comparing conventional statistical methods and machine-learning algorithms to improve the precision of risk factor determination in early-stage breast cancer prognosis. Additionally, efforts should be made to enhance the interpretability of machine-learning models, ensuring that the results obtained can be effectively communicated and utilized by clinical practitioners. This would enable more informed decision-making and personalized care in the treatment and follow-up processes for early-stage breast cancer patients.