{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T04:10:12Z","timestamp":1769919012842,"version":"3.49.0"},"reference-count":53,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2022,4,10]],"date-time":"2022-04-10T00:00:00Z","timestamp":1649548800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Pristine and trustworthy data are required for efficient computer modelling for medical decision-making, yet data in medical care is frequently missing. As a result, missing values may occur not just in training data but also in testing data that might contain a single undiagnosed episode or a participant. This study evaluates different imputation and regression procedures identified based on regressor performance and computational expense to fix the issues of missing values in both training and testing datasets. In the context of healthcare, several procedures are introduced for dealing with missing values. However, there is still a discussion concerning which imputation strategies are better in specific cases. This research proposes an ensemble imputation model that is educated to use a combination of simple mean imputation, k-nearest neighbour imputation, and iterative imputation methods, and then leverages them in a manner where the ideal imputation strategy is opted among them based on attribute correlations on missing value features. We introduce a unique Ensemble Strategy for Missing Value to analyse healthcare data with considerable missing values to identify unbiased and accurate prediction statistical modelling. The performance metrics have been generated using the eXtreme gradient boosting regressor, random forest regressor, and support vector regressor. The current study uses real-world healthcare data to conduct experiments and simulations of data with varying feature-wise missing frequencies indicating that the proposed technique surpasses standard missing value imputation approaches as well as the approach of dropping records holding missing values in terms of accuracy.<\/jats:p>","DOI":"10.3390\/e24040533","type":"journal-article","created":{"date-parts":[[2022,4,10]],"date-time":"2022-04-10T23:06:01Z","timestamp":1649631961000},"page":"533","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records"],"prefix":"10.3390","volume":"24","author":[{"given":"Shivani","family":"Batra","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6069-4540","authenticated-orcid":false,"given":"Rohan","family":"Khurana","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, KIET Group of Institutions, Delhi-NCR, Ghaziabad 201206, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2409-7172","authenticated-orcid":false,"given":"Mohammad Zubair","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information, Taibah University, Medina 42353, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2133-0757","authenticated-orcid":false,"given":"Wadii","family":"Boulila","sequence":"additional","affiliation":[{"name":"Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3787-7423","authenticated-orcid":false,"given":"Anis","family":"Koubaa","sequence":"additional","affiliation":[{"name":"Robotics and Internet-of-Things Laboratory, Prince Sultan University, Riyadh 12435, Saudi Arabia"}]},{"given":"Prakash","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Graphic Era (Deemed to be University), Dehradun 248002, India"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,10]]},"reference":[{"key":"ref_1","first-page":"9","article-title":"Missing data imputation: Focusing on single imputation","volume":"4","author":"Zhang","year":"2016","journal-title":"Ann. Transl. Med."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"157","DOI":"10.2147\/CLEP.S129785","article-title":"Missing data and multiple imputation in clinical epidemiological research","volume":"9","author":"Pedersen","year":"2017","journal-title":"Clin. Epidemiol."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Dong, X., Chen, C., Geng, Q., Cao, Z., Chen, X., Lin, J., Jin, Y., Zhang, Z., Shi, Y., and Zhang, X.D. (2019). An Improved Method of Handling Missing Values in the Analysis of Sample Entropy for Continuous Monitoring of Physiological Signals. Entropy, 21.","DOI":"10.3390\/e21030274"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR Guiding Principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci. Data"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wong-Lin, K., McClean, P.L., McCombe, N., Kaur, D., Sanchez-Bornot, J.M., Gillespie, P., Todd, S., Finn, D.P., Joshi, A., and Kane, J. (2020). Shaping a data-driven era in dementia care pathway through computational neurology approaches. BMC Med., 18.","DOI":"10.1186\/s12916-020-01841-1"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Rani, G., and Tiwari, P. (2021). Pre-Processing Highly Sparse and Frequently Evolving Standardized Electronic Health Records for Mining. Handbook of Research on Disease Prediction Through Data Analytics and Machine Learning, IGI Global.","DOI":"10.4018\/978-1-7998-2742-9"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning, Springer.","DOI":"10.1007\/978-1-4614-7138-7"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/j.compbiomed.2016.06.004","article-title":"Handling missing data in large healthcare dataset: A case study of unknown trauma outcomes","volume":"75","author":"Mirkes","year":"2016","journal-title":"Comput. Biol. Med."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Little, R.J., and Rubin, D.B. (2019). Statistical Analysis with Missing Data, John Wiley & Sons.","DOI":"10.1002\/9781119482260"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Sachdeva, S., Batra, D., and Batra, S. (2020, January 16\u201319). Storage Efficient Implementation of Standardized Electronic Health Records Data. Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Korea (South).","DOI":"10.1109\/BIBM49941.2020.9313343"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1186\/2193-1801-2-222","article-title":"Principled missing data methods for researchers","volume":"2","author":"Dong","year":"2013","journal-title":"SpringerPlus"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"3692","DOI":"10.1016\/j.patcog.2008.05.019","article-title":"Impact of imputation of missing values on classification error for discrete data","volume":"41","author":"Farhangfar","year":"2008","journal-title":"Pattern Recogn."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1177\/1094428103255532","article-title":"Multiple imputation for missing data: Making the most of what you know","volume":"6","author":"Fichman","year":"2003","journal-title":"Organ. Res. Methods"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Aleryani, A., Wang, W., and Iglesia, B.D.L. (2018, January 20\u201322). Dealing with missing data and uncertainty in the context of data mining. Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, Oviedo, Spain.","DOI":"10.1007\/978-3-319-92639-1_24"},{"key":"ref_15","unstructured":"Frank, E., and Witten, I.H. (1998). Generating Accurate Rule Sets without Global Optimization, University of Waikato."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1080\/01621459.1994.10476768","article-title":"Missing data, imputation, and the bootstrap","volume":"89","author":"Efron","year":"1994","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_17","first-page":"1","article-title":"mice: Multivariate imputation by chained equations in R","volume":"45","year":"2011","journal-title":"J. Stat. Softw."},{"key":"ref_18","first-page":"1","article-title":"DataWig: Missing Value Imputation for Tables","volume":"20","author":"Biessmann","year":"2019","journal-title":"J. Mach. Learn. Res."},{"key":"ref_19","unstructured":"Beaulieu-Jones, B.K., and Moore, J.H. (2017, January 3\u20137). Pooled Resource Open-Access Als Clinical Trials Consortium. Missing data imputation in the electronic health record using deeply learned autoencoders. Proceedings of the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1093\/sysbio\/syt100","article-title":"Missing data estimation in morphometrics: How much is too much?","volume":"63","author":"Clavel","year":"2014","journal-title":"Syst. Biol."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tada, M., Suzuki, N., and Okada, Y. (2022). Missing Value Imputation Method for Multiclass Matrix Data Based on Closed Itemset. Entropy, 24.","DOI":"10.3390\/e24020286"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3297","DOI":"10.1200\/JCO.2011.38.7589","article-title":"Missing data in clinical studies: Issues and methods","volume":"30","author":"Ibrahim","year":"2012","journal-title":"J. Clin. Oncol."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, J., Wang, M., Steinbach, M.S., Kumar, V., and Simon, G.J. (2018, January 17\u201318). Don\u2019t do imputation: Dealing with informative missing values in EHR data analysis. Proceedings of the 2018 IEEE International Conference on Big Knowledge (ICBK), Singapore.","DOI":"10.1109\/ICBK.2018.00062"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5901","DOI":"10.3390\/e16115901","article-title":"Comparative Study of Entropy Sensitivity to Missing Biosignal Data","volume":"16","author":"Cirugedaroldan","year":"2014","journal-title":"Entropy"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1035","DOI":"10.13063\/2327-9214.1035","article-title":"Strategies for handling missing data in electronic health record derived data","volume":"1","author":"Wells","year":"2013","journal-title":"EGEMS"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1076\/edre.7.4.353.8937","article-title":"A review of methods for missing data","volume":"7","author":"Pigott","year":"2001","journal-title":"Educ. Res. Eval."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1016\/j.jclinepi.2006.01.014","article-title":"A gentle introduction to imputation of missing values","volume":"59","author":"Donders","year":"2006","journal-title":"J. Clin. Epidemiol."},{"key":"ref_28","first-page":"e1448","article-title":"Missing data approaches in eHealth research: Simulation study and a tutorial for nonmathematically inclined researchers","volume":"12","author":"Lankers","year":"2010","journal-title":"J. Med. Internet Res."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1016\/j.jbi.2017.03.009","article-title":"Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record","volume":"68","author":"Hu","year":"2017","journal-title":"J. Biomed. Inform."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1109\/TKDE.2018.2883103","article-title":"Enriching data imputation under similarity rule constraints","volume":"32","author":"Song","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Nikfalazar, S., Yeh, C.H., Bedingfield, S., and Khorshidi, H.A. (2017, January 9\u201312). A new iterative fuzzy clustering algorithm for multiple imputation of missing data. Proceedings of the 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy.","DOI":"10.1109\/FUZZ-IEEE.2017.8015560"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Song, S., and Sun, Y. (2020, January 6\u201310). Imputing various incomplete attributes via distance likelihood maximization. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Online.","DOI":"10.1145\/3394486.3403096"},{"key":"ref_33","unstructured":"Chu, X., Ilyas, I.F., and Papotti, P. (2013, January 8\u201312). Holistic data cleaning: Putting violations into context. Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia."},{"key":"ref_34","unstructured":"Breve, B., Caruccio, L., Deufemia, V., and Polese, G. (2022, April 02). RENUVER: A Missing Value Imputation Algorithm based on Relaxed Functional Dependencies. Open Proceedings. Available online: https:\/\/openproceedings.org\/2022\/conf\/edbt\/paper-19.pdf."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1016\/j.compbiomed.2014.08.004","article-title":"Mining approximate temporal functional dependencies with pure temporal grouping in clinical databases","volume":"62","author":"Combi","year":"2015","journal-title":"Comput. Biol. Med."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1002\/mpr.329","article-title":"Multiple imputation by chained equations: What is it and how does it work?","volume":"20","author":"Azur","year":"2011","journal-title":"Int. J. Methods Psychiatr. Res."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Turska, E., Jurga, S., and Piskorski, J. (2021). Mood Disorder Detection in Adolescents by Classification Trees, Random Forests and XGBoost in Presence of Missing Data. Entropy, 23.","DOI":"10.3390\/e23091210"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhou, Z.H. (2012). Ensemble Methods: Foundations and Algorithms, CRC Press.","DOI":"10.1201\/b12207"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Hastie, T., Tibshirani, R., Friedman, J.H., and Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.","DOI":"10.1007\/978-0-387-84858-7"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Troussas, C., Krouska, A., Sgouropoulou, C., and Voyiatzis, I. (2020). Ensemble Learning Using Fuzzy Weights to Improve Learning Style Identification for Adapted Instructional Routines. Entropy, 22.","DOI":"10.3390\/e22070735"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhao, D., Wang, X., Mu, Y., and Wang, L. (2021). Experimental Study and Comparison of Imbalance Ensemble Classifiers with Dynamic Selection Strategy. Entropy, 23.","DOI":"10.3390\/e23070822"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Rahimi, N., Eassa, F., and Elrefaei, L. (2021). One- and Two-Phase Software Requirement Classification Using Ensemble Deep Learning. Entropy, 23.","DOI":"10.3390\/e23101264"},{"key":"ref_44","first-page":"e8960","article-title":"Characterizing and managing missing structured data in electronic health records: Data analysis","volume":"6","author":"Lavage","year":"2018","journal-title":"JMIR Med. Inform."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.cose.2015.09.005","article-title":"Intelligent financial fraud detection: A comprehensive review","volume":"57","author":"West","year":"2016","journal-title":"Comput. Secur."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"107360","DOI":"10.1016\/j.dib.2021.107360","article-title":"Dataset of COVID-19 outbreak and potential predictive features in the USA","volume":"38","author":"Haratian","year":"2021","journal-title":"Data Brief"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"13149","DOI":"10.1109\/ACCESS.2019.2893448","article-title":"XGBoost-based algorithm interpretation and application on post-fault transient stability status prediction of power system","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_49","first-page":"779","article-title":"Support vector regression machines","volume":"28","author":"Drucker","year":"1996","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1002\/hyp.9584","article-title":"Improving the forecasts of extreme streamflow by support vector regression with the data extracted by self-organizing map","volume":"28","author":"Wu","year":"2014","journal-title":"Hydrol. Process."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.jhydrol.2008.05.028","article-title":"River stage prediction based on a distributed support vector regression","volume":"358","author":"Wu","year":"2008","journal-title":"J. Hydrol."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"704","DOI":"10.1016\/j.jhydrol.2006.01.021","article-title":"Support Vector Regression for Real-Time Flood Stage Forecasting","volume":"328","author":"Yu","year":"2006","journal-title":"J. Hydrol."},{"key":"ref_53","unstructured":"Viswanathan, M., and Kotagiri, R. (2004, January 4\u20137). Comparing the performance of support vector machines to regression with structural risk minimisation. Proceedings of the International Conference on Intelligent Sensing and Information Processing, Chennai, India."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/533\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:51:25Z","timestamp":1760136685000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/4\/533"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,10]]},"references-count":53,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["e24040533"],"URL":"https:\/\/doi.org\/10.3390\/e24040533","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,10]]}}}