A systematic literature review on Machine Learning Model evaluation on healthcare applications

Cezar Miranda Paula de  Souza; Cephas Alves da Silveira  Barreto; Lhayana Vieira de  Macedo; Bruna Alice Oliveira de  Brito; Victor Vieira  Targino; Emanuel Costa  Betcel; Fernando Gomes de  Almeida; Arthur Andrade Galvíncio  Rodrigues; Ramon Santos  Malaquias; Itamir de Morais  Barroca Filho

doi:10.33448/rsd-v12i6.42042

Autores/as

Cezar Miranda Paula de Souza Federal University of Rio Grande do Norte https://orcid.org/0009-0005-7189-8115
Cephas Alves da Silveira Barreto Federal University of Rio Grande do Norte https://orcid.org/0000-0002-4756-8571
Lhayana Vieira de Macedo Federal University of Rio Grande do Norte https://orcid.org/0009-0009-0509-0555
Bruna Alice Oliveira de Brito Federal University of Rio Grande do Norte https://orcid.org/0009-0001-8116-495X
Victor Vieira Targino Federal University of Rio Grande do Norte https://orcid.org/0000-0002-9036-6537
Emanuel Costa Betcel Federal University of Rio Grande do Norte https://orcid.org/0009-0009-6814-4311
Fernando Gomes de Almeida Federal University of Rio Grande do Norte https://orcid.org/0009-0006-2185-6969
Arthur Andrade Galvíncio Rodrigues Federal University of Rio Grande do Norte https://orcid.org/0009-0002-7107-742X
Ramon Santos Malaquias Federal University of Rio Grande do Norte https://orcid.org/0000-0002-8350-2836
Itamir de Morais Barroca Filho Federal University of Rio Grande do Norte https://orcid.org/0000-0003-1694-8237

DOI:

https://doi.org/10.33448/rsd-v12i6.42042

Palabras clave:

Validación de modelos de AA; AA para el sector de la salud; Monitoreo de modelos de AA.

Resumen

Los modelos de Aprendizaje Automático (AA) se han aplicado para resolver problemas en diversos campos, lo que implica necesariamente una adecuada evaluación de los modelos para garantizar su rendimiento. Una vez implementados, los modelos de AA están sujetos a problemas de rendimiento, como los relacionados con los cambios en los datos (drift). Este tipo de problema ha motivado esfuerzos en el análisis y mantenimiento de modelos, así como en el aprendizaje continuo, que busca la capacidad de aprender de forma continua a partir de un flujo continuo de datos. Por lo tanto, es importante entender y desarrollar metodologías que puedan ser utilizadas para evaluar modelos de AA, lo que permite su uso en entornos del mundo real. Entre las áreas actuales de aplicación del AA, una que destaca en particular es el Aprendizaje Automático para la Salud, especialmente en conjunto con el Software de Soporte de Decisiones para Aplicaciones Médicas, lo que presenta desafíos específicos para la evaluación y monitoreo de modelos, especialmente dado que una predicción o clasificación incorrecta puede conducir a situaciones que ponen en peligro la vida. Este artículo presenta una revisión sistemática de la literatura, que tiene como objetivo identificar técnicas de vanguardia para evaluar y mantener modelos de AA para la salud en un uso efectivo en el mundo real.

Citas

Arowolo, M. O., Ogundokun, R. O., Misra, S., Kadri, A. F., & Aduragba, T. O. (2022). Machine Learning Approach Using KPCA-SVMs for Predicting COVID-19. In Garg, L., Chakraborty, C., Mahmoudi, S., Sohmen, V. S. (Eds.), Healthcare Informatics for Fighting COVID-19 and Future Epidemics (pp. 193–209). Springer International Publishing. https://doi.org/10.1007/978-3-030-72752-9_10

Bellocchio, F., Lonati, C., Ion Titapiccolo, J., Nadal, J., Meiselbach, H., Schmid, M., Baerthlein, B., Tschulena, U., Schneider, M., Schultheiss, U. T., Barbieri, C., Moore, C., Steppan, S., Eckardt, K.-U., Stuard, S., & Neri, L. (2021). Validation of a Novel Predictive Algorithm for Kidney Failure in Patients Suffering from Chronic Kidney Disease: The Prognostic Reasoning System for Chronic Kidney Disease (PROGRES-CKD). International Journal of Environmental Research and Public Health, 18 (23). https://doi.org/10.3390/ijerph182312649

Birkenbihl, C., Emon, M. A., Vrooman, H., Westwood, S., Lovestone, S., AddNeuroMed Consortium, Hofmann-Apitius, M., Fröhlich, H., & Alzheimer’s Disease Neuroimaging Initiative (2020). Differences in Cohort Study Data Affect External Validation of Artificial Intelligence Models for Predictive Diagnostics of Dementia - Lessons for Translation Into Clinical Practice. The EPMA Journal, 11 (3), 367–376. https://doi.org/10.1007/s13167-020-00216-z

Carolan, J. E., McGonigle, J., Dennis, A., Lorgelly, P., & Banerjee, A. (2022). Technology-Enabled, Evidence-Driven, and Patient-Centered: The Way Forward for Regulating Software as a Medical Device. JMIR Med Inform, 10 (1), e34038. https://doi.org/10.2196/34038

Collin, C. B., Gebhardt, T., Golebiewski, M., Karaderi, T., Hillemanns, M., Khan, F. M., Salehzadeh-Yazdi, A., Kirschner, M., Krobitsch, S., consortium, E.-S., & Kuepfer, L. (2022). Computational Models for Clinical Applications in Personalized Medicine-Guidelines and Recommendations for Data Integration and Model Validation. Journal of Personalized Medicine, 12 (2). https://doi.org/10.3390/jpm12020166

Duckworth, C., Chmiel, F. P., Burns, D. K., Zlatev, Z. D., White, N. M., Daniels, T. W. V., Kiuber, M., & Boniface, M. J. (2021). Emergency Department Admissions During COVID-19: Explainable Machine Learning to Characterise Data Drift and Detect Emergent Health Risks. MedRxiv. https://doi.org/10.1101/2021.05.27.21257713

Fries, J. A., Varma, P., Chen, V. S., Xiao, K., Tejeda, H., Saha, P., Dunnmon, J., Chubb, H., Maskatia, S., Fiterau, M., Delp, S., Ashley, E., Ré, C., & Priest, J. R. (2019). Weakly Supervised Classification of Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences. BioRxiv. https://doi.org/10.1101/339630

Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L., Chen, I. Y., & Ranganath, R. (2020). A Review of Challenges and Opportunities in Machine Learning for Health. AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science Proceedings, 2020, 191–200.

https://doi.org/10.48550/arXiv.1806.00388

Gopal, M. (2019). Applied Machine Learning. McGraw-Hill Education.

Harris, S., Bonnici, T., Keen, T., Lilaonitkul, W., White, M. J., & Swanepoel, N. (2022). Clinical Deployment Environments: Five Pillars of Translational Machine Learning for Health. Frontiers in Digital Health, 4. https://doi.org/10.3389/fdgth.2022.939292

Van Helvoort, E. M., van Spil, W. E., Jansen, M. P., Welsing, P. M., Kloppenburg, M., Loef, M., Blanco, F. J., Haugen, I. K., Berenbaum, F., Bacardit, J., & others. (2020). Cohort Profile: The Applied Public-Private Research Enabling Osteoarthritis Clinical Headway (IMI-APPROACH) Study: A 2-Year, European, Cohort Study to Describe, Validate and Predict Phenotypes of Osteoarthritis Using Clinical, Imaging and Biochemical Markers. BMJ Open, 10 (7), e035101. https://doi.org/10.1136/bmjopen-2019-035101

Huda, A., Castaño, A., Niyogi, A., Schumacher, J., Stewart, M., Bruno, M., Hu, M., Ahmad, F., Deo, R., & Shah, S. (2021). A Machine Learning Model for Identifying Patients at Risk for Wild-type Transthyretin Amyloid Cardiomyopathy. Nature Communications, 12, 2725. https://doi.org/10.1038/s41467-021-22876-9

Iakovakis, D., Hadjidimitriou, S., Charisis, V., Bostantjopoulou, S., Katsarou, Z., Klingelhoefer, L., Reichmann, H., Dias, S. B., Diniz, J. A., Trivedi, D., Chaudhuri, K. R., & Hadjileontiadis, L. J. (2018). Motor Impairment Estimates via Touchscreen Typing Dynamics Toward Parkinson’s Disease Detection From Data Harvested In-the-Wild. Frontiers in ICT, 5. https://doi.org/10.3389/fict.2018.00028

Johri, P., Saxena, V. S., & Kumar, A. (2021). Rummage of Machine Learning Algorithms in Cancer Diagnosis. International Journal of E-Health and Medical Communications (IJEHMC), 12 (1), 1–15. http://doi.org/10.4018/IJEHMC.2021010101

Kamran, F., Tang, S., Otles, E., McEvoy, D. S., Saleh, S. N., Gong, J., Li, B. Y., Dutta, S., Liu, X., Medford, R. J., Valley, T. S., West, L. R., Singh, K., Blumberg, S., Donnelly, J. P., Shenoy, E. S., Ayanian, J. Z., Nallamothu, B. K., Sjoding, M. W., & Wiens, J. (2022). Early Identification of Patients Admitted to Hospital for COVID-19 at Risk of Clinical Deterioration: Model Development and Multisite External Validation Study. BMJ, 376. https://doi.org/10.1136/bmj-2021-068576

Kitchenham, B., & Charters, S. (2007). Guidelines for performing systematic literature reviews in software engineering version 2.3. Engineering, 45(4ve), 1051.

Lam, J., Shimizu, C., Tremoulet, A., Bainto, E., Roberts, S., Sivilay, N., Gardiner, M., Kanegaye, J., Hogan, A., Salazar, J., Mohandas, S., Szmuszkovicz, J., Mahanta, S., Dionne, A., Newburger, J., Ansusinha, E., Debiasi, R., Hao, S., Ling, B., & Sykes, M. (2022). A Machine-Learning Algorithm for Diagnosis of Multisystem Inflammatory Syndrome in Children and Kawasaki Disease in the USA: A Retrospective Model Development and Validation Study. The Lancet Digital Health, 4, e717–e726. https://doi.org/10.1016/S2589-7500(22)00149-2

Li, J., Liu, S., Hu, Y., Zhu, L., Mao, Y., & Liu, J. (2022). Predicting Mortality in Intensive Care Unit Patients With Heart Failure Using an Interpretable Machine Learning Model: Retrospective Cohort Study. J Med Internet Res, 24 (8), e38082. https://doi.org/10.2196/38082

Lin, W., Gan, W., Feng, P., Zhong, L., Yao, Z., Chen, P., He, W., & Yu, N. (2022). Online Prediction Model for Primary Aldosteronism in Patients With Hypertension in Chinese Population: A Two-Center Retrospective Study. Frontiers in Endocrinology, 13. https://doi.org/10.3389/fendo.2022.882148

Luo, C., Zhu, Y., Zhu, Z., Li, R., Chen, G., & Wang, Z. (2022). A Machine Learning-Based Risk Stratification Tool for In-Hospital Mortality of Intensive Care Unit Patients With Heart Failure. Journal of Translational Medicine, 20 (1), 136. https://doi.org/10.1186/s12967-022-03340-8

Maleki, F., Muthukrishnan, N., Ovens, K., Reinhold, C., & Forghani, R. (2020). Machine Learning Algorithm Validation: From Essentials to Advanced Applications and Implications for Regulatory Certification and Deployment. Neuroimaging Clinics of North America, 30 (4), 433–445. https://doi.org/10.1016/j.nic.2020.08.004

Maleki, F., Muthukrishnan, N., Ovens, K., Md, C., & Forghani, R. (2020). Machine Learning Algorithm Validation. Neuroimaging Clinics of North America, 30, 433–445. https://doi.org/10.1016/j.nic.2020.08.004

Malki, Z., Atlam, E.-S., Ewis, A., Dagnew, G., Ghoneim, O. A., Mohamed, A. A., Abdel-Daim, M. M., & Gad, I. (2021). The COVID-19 Pandemic: Prediction Study Based on Machine Learning Models. Environmental Science and Pollution Research, 28, 40496–40506. https://doi.org/10.1007/s11356-021-13824-7

Mitchell, T. M., & others. (2007). Machine Learning (Vol. 1). McGraw-hill New York.

Qasim, H. M., Ata, O., Ansari, M. A., Alomary, M. N., Alghamdi, S., & Almehmadi, M. (2021). Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem. Medicina, 57 (11), 1217. https://doi.org/10.3390/medicina57111217

Rafiq, R., Modave, F., Guha, S., & Albert, M. (2020). Validation Methods to Promote Real-world Applicability of Machine Learning in Medicine. 2020 3rd International Conference on Digital Medicine and Image Processing, 13–19. https://doi.org/10.1145/3441369.3441372

Risman, A., Trelles, M., & Denning, D. W. (2021). Evaluation of Multiple Open-Source Deep Learning Models for Detecting and Grading COVID-19 on Chest Radiographs. Journal of Medical Imaging, 8 (6), 064502. https://doi.org/10.1117/1.JMI.8.6.064502

Rojas, J. C., Fahrenbach, J., Makhni, S., Cook, S. C., Williams, J. S., Umscheid, C. A., & Chin, M. H. (2022). Framework for Integrating Equity Into Machine Learning Models: A Case Study. Chest, 161 (6), 1621–1627. https://doi.org/10.1016/j.chest.2022.02.001

Sengupta, P. P., Shrestha, S., Berthon, B., Messas, E., Donal, E., Tison, G. H., Min, J. K., D’hooge, J., Voigt, J.-U., Dudley, J., Verjans, J. W., Shameer, K., Johnson, K., Lovstakken, L., Tabassian, M., Piccirilli, M., Pernot, M., Yanamala, N., Duchateau, N., & others. (2020). Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC: Cardiovascular Imaging, 13 (9), 2017–2035. https://doi.org/10.1016/j.jcmg.2020.07.015

Shickel, B., Siegel, S., Heesacker, M., Benton, S., & Rashidi, P. (2020). Automatic Detection and Classification of Cognitive Distortions in Mental Health Text. 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), 275–280. https://doi.org/10.1109/BIBE50027.2020.00052

Sun, H., Depraetere, K., Meesseman, L., Cabanillas Silva, P., Szymanowsky, R., Fliegenschmidt, J., Hulde, N., von Dossow, V., Vanbiervliet, M., De Baerdemaeker, J., Roccaro-Waldmeyer, D. M., Stieg, J., Domínguez Hidalgo, M., & Dahlweid, F.-M. (2022). Machine Learning–Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance. J Med Internet Res, 24 (6), e34295. https://doi.org/10.2196/34295

The RADAR-CNS Consortium, Böttcher, S., Bruno, E., Manyakov, N. V., Epitashvili, N., Claes, K., Glasstetter, M., Thorpe, S., Lees, S., Dümpelmann, M., van Laerhoven, K., Richardson, M. P., & Schulze-Bonhage, A. (2021). Detecting Tonic-Clonic Seizures in Multimodal Biosignal Data From Wearables: Methodology Design and Validation. JMIR MHealth and UHealth, 9 (11). https://doi.org/10.2196/27674

Treveil, M., Omont, N., Stenac, C., Lefevre, K., Phan, D., Zentici, J., Lavoillotte, A., Miyazaki, M., & Heidmann, L. (2020). Introducing MLOps. O’Reilly Media.

Vieira, D. M., Fernandes, C., Lucena, C., & Lifschitz, S. (2021). Driftage: A Multi-Agent System Framework for Concept Drift Detection. GigaScience, 10 (6). https://doi.org/10.1093/gigascience/giab030

Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., Jung, K., Heller, K., Kale, D., Saeed, M., & others. (2019). Do No Harm: A Roadmap for Responsible Machine Learning for Health Care. Nature Medicine, 25 (9), 1337–1340. https://doi.org/10.1038/s41591-019-0548-6

Wojtusiak., J. (2021). Reproducibility, Transparency and Evaluation of Machine Learning in Health Applications. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies - HEALTHINF, 685–692. https://doi.org/10.5220/0010348306850692

Yang, C., Zou, Y., Liu, J., & Mulligan, K. (2014). Predictive Model Evaluation for PHM. International Journal of Prognostics and Health Management, 5. https://doi.org/10.36001/ijphm.2014.v5i2.2238

Una revisión sistemática de la literatura sobre la evaluación de Modelos de Aprendizaje Automático en aplicaciones de salud

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

JOURNAL METRICS

Idioma

Enviar un artículo