Comparing performances and effectiveness of machine learning classifiers in detecting financial accounting fraud for Turkish SMEs

Hamal, Serhan; ŞENVAR, ÖZLEM

doi:10.2991/ijcis.d.210203.007

Comparing performances and effectiveness of machine learning classifiers in detecting financial accounting fraud for Turkish SMEs

Atıf İçin Kopyala

Hamal S., ŞENVAR Ö.

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, cilt.14, sa.1, ss.769-782, 2021 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 1
Basım Tarihi: 2021
Doi Numarası: 10.2991/ijcis.d.210203.007
Dergi Adı: INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, Directory of Open Access Journals
Sayfa Sayıları: ss.769-782
Anahtar Kelimeler: Financial accounting fraud, SMEs, Machine learning classifiers, Sampling methods, SMOTE, Feature selection, MANAGEMENT, SMOTE
Marmara Üniversitesi Adresli: Evet

Özet

Turkish small- and medium-sized enterprises (SMEs) are exposed to fraud risks and creditor banks are facing big challenges to deal with financial accounting fraud. This study explores effectiveness of machine learning classifiers in detecting financial accounting fraud assessing financial statements of 341 Turkish SMEs from 2013 to 2017. The data are obtained from one of the leading creditor banks of Turkey. Highly imbalanced classes of 1384 nonfraudulent cases and 321 fraudulent cases (by 122 firms) are detected thus sampling techniques are used to mitigate class imbalance problem. Research methodology consists of two stages. First stage is data preprocessing wherein financial ratio calculation, feature selection methods for defining financial ratios with the greatest impact on fraudulent financial statements and two sampling methods of Synthetic Minority Oversampling Technique (SMOTE) as oversampling and undersampling are performed, respectively. Second stage is performance evaluation and comparison of classifiers wherein seven different classifiers (support vector machine, Naive Bayes, artificial neural network, K-nearest neighbor, random forest, logistic regression, and bagging) are executed and compared by using performance metrics. Classifiers are also compared without using any feature selection and/or sampling techniques. Results reveal that random forestwithout feature selection-oversampling model outperforms all other models. (C) 2021 The Authors. Published by Atlantis Press B.V.