A Hybrid Explainable Machine Learning Framework for Early Heart Disease Prediction Using Clinical Data

Shubham Gupta

Abstract

Heart disease is a significant type of cardiovascular disease and one of the primary causes of death worldwide, which is why precise and timely diagnostic assistance mechanisms should be a major priority. This paper is a proposal of a hybrid explainable machine learning model to predict early heart disease based on clinical and lifestyle-related features. The suggested method combines several machine learning classifiers with an ensemble learning approach to enhance the predictive robustness, and Shapley Additive explanations (SHAP) are used to enhance the model interpretability. A publicly available UCI Cleveland heart disease dataset that was obtained via Kaggle (n=303 patient records) with 14 clinical features was experimented upon. The proposed hybrid model beats the baseline models, such as Logistic Regression, Support Vector Machine, Random Forest, and XGBoost, and the proposed model has a better performance with an accuracy of 94.87, precision of 95.21, recall of 94.02, F1-score of 94.61, and ROC-AUC of 0.97. In addition, the analysis using the SHAP allowed determining the type of chest pain, the highest heart rate that was reached, ST depression, the number of major vessels, and serum cholesterol as the most impactful predictors, as expected from clinical knowledge. The findings prove that the proposed framework is successful in terms of achieving a balance between predictive accuracy and interpretability, which makes it a useful decision- support instrument to support early detection of heart-related diseases in clinics.