Comparing the Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4 in Biostatistics Exam: Pros and Cons as an Education Assistant.

Asker, Ömer; Özgür, EMRAH; Eriç, Alper; Bekiroğlu, GÜLNAZ

doi:10.33461/uybisbbd.1329650

Comparing the Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4 in Biostatistics Exam: Pros and Cons as an Education Assistant.

Asker Ö. F., Özgür E. G., Eriç A., Bekiroğlu G. N.

International Journal of Management Information Systems and Computer Science, cilt.7, sa.2, ss.85-94, 2023 (Hakemli Dergi)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 7 Sayı: 2
Basım Tarihi: 2023
Doi Numarası: 10.33461/uybisbbd.1329650
Dergi Adı: International Journal of Management Information Systems and Computer Science
Sayfa Sayıları: ss.85-94
Marmara Üniversitesi Adresli: Evet

Özet

Studies have shown that the level of knowledge in biostatistics among medical students is lower than expected. This situation calls for the need to implement new methods in biostatistics education. The aim of this study is to evaluate the feasibility of ChatGPT as an education assistant in biostatistics. ChatGPT is a natural language processing model developed by OpenAI. It provides human-like responses to questions asked by users and is utilized in various fields for gaining information. ChatGPT operates with the latest GPT-4 model, while the previous version, GPT-3.5, is still in use. In this study the biostatistics performance of 245 Marmara University School of Medicine students was compared to ChatGPT-3.5 and ChatGPT-4 using an exam covering basic biostatistics topics. According to findings, ChatGPT-3.5 achieved 80% success rate in the exam, while ChatGPT-4 achieved 100% success rate. In contrast, the students achieved 67.9% success rate. Furthermore, ChatGPT-3.5 only recorded 33% success rate in questions requiring mathematical calculations, while ChatGPT-4 achieved 100% success rate in these questions. In conclusion, ChatGPT is a potential education assistant in biostatistics. Its success has increased significantly in the current version compared to the previous one. Further studies will be needed as new versions are relea