The impact of imputation procedures with machine learning methods on the performance of classifiers: An application to coronary artery disease data including missing values

Jale Bektas; Turgay Ibrikci; Ismail Turkay Ozcan

doi:10.4066/biomedicalresearch.29-18-199

Abstract

The impact of imputation procedures with machine learning methods on the performance of classifiers: An application to coronary artery disease data including missing values

Prediction and learning in the presence of missing data are pervasive problems in data analysis by machine learning. This study focuses on the problems of collaborative classification with missing data on Coronary Artery Disease (CAD) and suggests alternative imputation methods in the case of the lack of laboratory test as well other specific parameters. This study develops three novel data imputation methods utilizing machine learning algorithms (K-means, Multilayer Perceptron (MLP), and Self- Organizing Maps (SOMs)) and compares the performance of our methods with well-known mean method. Benchmark classification methods (Logistic Model Trees (LMT), MLP, Random Forest (RF), and Support Vector Machine (SVM)) are used to conduct experiments on CAD dataset after imputation. The performance of the classifiers is evaluated according to the values of accuracy, specificity, sensitivity, f-measure, precision and normalized root mean square error. Based on statistical analysis, the SOM imputation method achieves the best values for accuracy (88.23%), F-measure (0.879), and precision (0.881). Moreover, MLP is mostly more stable than other imputation methods when the mean scores of the results of classifiers are considered. According to the results, the data imputation experiments conducted in this study suggests that machine learning imputation methods increase the prediction performance of the classifiers and strengthen disease-diagnosed success.

Author(s): Jale Bektas, Turgay Ibrikci, Ismail Turkay Ozcan
Abstract | Full-Text | PDF

Share this