XAI-Powered Comparative Diabetes Prediction with XGBoost, SVM and Artificial Neural Networks
Furkan Sefa Demirci1*, Remzi Yıldırım2
1Computer Engineering, Tokat Gaziosmanpasa University, Tokat, Turkiye
2Computer Engineering, Tokat Gaziosmanpasa University, Tokat, Turkiye
* Corresponding author: furkan.demirci@gop.edu.tr
Presented at the International Symposium on AI-Driven Engineering Systems (ISADES2025), Tokat, Turkiye, Jun 19, 2025
SETSCI Conference Proceedings, 2025, 22, Page (s): 111-114 , https://doi.org/10.36287/setsci.22.30.001
Published Date: 10 July 2025
In this study, diabetes prediction was conducted using XGBoost, Support Vector Machines (SVM), and a Multilayer Artificial Neural Network (ANN). Attributes that cannot be biologically ‘0’ (e.g. glucose, BMI) were first marked as NaN. After filling the missing values with the median, the data were transformed with StandardScaler. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied, and feature selection was performed using the VarianceThreshold method. Hyperparameter optimization for all models was conducted using the RandomizedSearchCV approach.Model performance was evaluated based on ROC-AUC (Receiver Operating Characteristic – Area Under Curve), accuracy, and F1 score. The best results were achieved by the ANN model (ROC-AUC: 0.829, accuracy: 0.76, F1 score: 0.678), followed by XGBoost (ROC-AUC: 0.820, accuracy: 0.734, F1 score: 0.667), and SVM (ROC-AUC: 0.809, accuracy: 0.734, F1 score: 0.661).As part of the Explainable Artificial Intelligence (XAI) framework, global SHAP (SHapley Additive exPlanations) analysis was applied to the XGBoost model, while local explanations were generated using LIME (Local Interpretable Model-Agnostic Explanations) for the SVM and ANN models. SHAP analysis identified glucose, body mass index (BMI), and age as the most influential features. LIME provided case-specific interpretations by highlighting the contribution of individual features for each patient. These approaches contributed to enhanced transparency in clinical decision-making processes.
Keywords - XAI, Diabetes, XGBoost, SVM, ANN, Artificial Intelligence, SHAP, LIME
[1] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proc. 30th Adv. Neural Inf. Process. Syst. Conf. (NeurIPS), 2017, pp. 4765–4774.
[2] M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why should I trust you?’ Explaining the predictions of any classifier,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), 2016, pp. 1135–1144.
[3] N. Mohan and V. Jain, “Performance analysis of support vector machine in diabetes prediction,” in Proc. Int. Conf. Electron., Commun. Comput. Technol. (ICECCT), 2020, pp. 1–5.
[4] M. Revathi, A. B. Godbin, S. N. Bushra, and S. A. Sibi, “Application of ANN, SVM and KNN in the prediction of diabetes mellitus,” in Proc. Int. Conf. Electron. Syst. Intell. Comput. (ICESIC), 2022, pp. 557–562.
[5] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012.
[6] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (KDD), 2016, pp. 785–794..
[7] M. Goyal, S. Goyal, A. Kumar, and A. Arora, “Application of artificial neural networks in healthcare: A review,” Health Inf. Sci. Syst., vol. 8, no. 1, art. 24, 2020.
[8] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, A Practical Guide to Support Vector Classification. Taipei, Taiwan: Dept. Comput. Sci., Nat. Taiwan Univ., 2016.
[9] World Health Organization, Global Report on Diabetes. Geneva, Switzerland: WHO, 2016.
[10] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput. Sci., vol. 2, no. 3, art. 160, 2021
[11] E. Tjoa and C. Guan, “A survey on explainable artificial intelligence (XAI): Towards medical XAI,”
[12] A. Adadi and M. Berrada, “Peeking inside the black box: A survey on explainable artificial intelligence (XAI),” IEEE Access, vol. 6, pp. 52138–52160, 2018.
[13] J. Waring, C. Lindvall, and R. Umeton, “Automated machine learning: Review of the state-of-the-art and opportunities for healthcare,” Artif. Intell. Med., vol. 104, art. 101822, Oct. 2020.
[14] G. Dharmarathne, T. N. Jayasinghe, M. Bogahawaththa, D. P. P. Meddage, and U. Rathnayake, “A novel machine learning approach for diagnosing diabetes with a self-explainable interface,” Healthcare Analytics, vol. 5, art. 100301, Feb. 2024.
[15] C. C. Olisah, L. Smith, and M. Smith, “Diabetes mellitus prediction and diagnosis from a data preprocessing and machine-learning perspective,” Comput. Methods Programs Biomed., vol. 220, art. 106773, 2022.
[16] C. J. Ejiyi et al., “A robust predictive diagnosis model for diabetes mellitus using Shapley-incorporated machine learning algorithms,” Healthcare Analytics, vol. 3, art. 100166, 2023.
[17] C. Duckworth et al., “Explainable machine learning for real-time hypoglycemia and hyperglycemia prediction and personalized control recommendations,” J. Diabetes Sci. Technol., vol. 18, no. 1, pp. 113–123, Jan. 2024.
[18] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine learning and data mining methods in diabetes research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017.
[19] K. F. Ahmed et al., “An interpretable framework for predicting type 2 diabetes using ML and explainable AI,” in Proc. 26th Int. Conf. Comput. Inf. Technol. (ICCIT), 2023, pp. 1–6.
[20] R. Hasan, V. Dattana, S. Mahmood, and S. Hussain, “Towards transparent diabetes prediction: Combining AutoML and explainable AI for improved clinical insights,” Information, vol. 16, no. 1, art. 7, Jan. 2025.
|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
