Prediction of the gender inequality index based on data-driven interpretable ensemble learning methods


Özdemir M. H., Aylak B. L., Çakıroğlu C., Bağcı M.

SOCIO-ECONOMIC PLANNING SCIENCES, cilt.103, ss.102366, 2026 (SCI-Expanded, SSCI, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 103
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1016/j.seps.2025.102366
  • Dergi Adı: SOCIO-ECONOMIC PLANNING SCIENCES
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, EconLit, Educational research abstracts (ERA), Geobase, Index Islamicus, Political Science Complete, Public Affairs Index, Urban Studies Abstracts
  • Sayfa Sayıları: ss.102366
  • Anahtar Kelimeler: Gender inequality index, Regression analysis, Machine learning, Predictive modeling, SHAP
  • Marmara Üniversitesi Adresli: Evet

Özet

Gender inequality is acknowledged as a major hindrance to human development, evident in multiple social, political, economic, and cultural aspects. Therefore, identifying the factors contributing to gender inequality and quantifying them is crucial for enhancing societal progress. A new index, the gender inequality index (GII), was introduced in the 2010 Human Development Report to quantify and compare gender inequalities among different countries. Multiple indicators are used to calculate the GII, which involves complex analytical calculations. This study utilizes these indicators as input features to predict the GII using XGBoost, CatBoost, Extra Trees, LightGBM, Ridge, and Lasso regression models. These regressors are trained for predicting the GII as a function of maternal mortality ratio, adolescent birth rate, share of seats in parliament, female population with at least some secondary education, male population with at least some secondary education, female labour force participation rate, and male labour force participation rate. It is observed that XGBoost, CatBoost, Extra Trees and LightGBM predictors have R2 score greater than 0.98, while the Ridge and Lasso regressors have  score less than 0.90. The highest average accuracy is obtained by the CatBoost model while the XGBoost model has the greatest computational speed. Furthermore, the Shapley additive explanations methodology is utilized to detect the impact of different input features on the model predictions, and this information allows for more precise calculation of the GII. Thus, the proposed machine learning procedure enables both simplicity and flexibility for the GII prediction and provides more effective use of the GII.