A New Natural Language Processing-Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study.

Autor: Paiva B; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Gonçalves MA; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., da Rocha LCD; Computer Science Department, Universidade Federal de São João del-Rei, Brazil, São João del-Rei, Brazil., Marcolino MS; Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Lana FCB; Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Souza-Silva MVR; Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Almeida JM; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Pereira PD; Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., de Andrade CMV; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Gomes AGDR; Hospitais da Rede Mater Dei, Belo Horizonte, Brazil., Ferreira MAP; Hospital de Clínicas de Porto Alegre, Porto Alegre, Brazil., Bartolazzi F; Hospital Santo Antônio, Curvelo, Brazil., Sacioto MF; Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil., Boscato AP; Hospital Tacchini, Bento Gonçalves, Brazil., Guimarães-Júnior MH; Hospital Márcio Cunha, Ipatinga, Brazil., Dos Reis PP; Hospital Metropolitano Doutor Célio de Castro, Belo Horizonte, Brazil., Costa FR; Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Jorge AO; Hospital Risoleta Tolentino Neves, Belo Horizonte, Brazil., Coelho LR; Faculdade de Medicina, Universidade Federal dos Vales do Jequitinhonha e Mucuri, Teófilo Otoni, Brazil., Carneiro M; Hospital Santa Cruz, Santa Cruz do Sul, Brazil., Sales TLS; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Araújo SF; Hospital Semper, Belo Horizonte, Brazil., Silveira DV; Hospital Unimed BH, Belo Horizonte, Brazil., Ruschel KB; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Santos FCV; Hospital Universitário de Santa Maria, Santa Maria, Brazil., Cenci EPA; Hospital Moinhos de Vento, Porto Alegre, Brazil., Menezes LSM; Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Belo Horizonte, Brazil., Anschau F; Hospital Nossa Senhora da Conceição, Porto Alegre, Brazil., Bicalho MAC; Fundação Hospitalar do Estado de Minas Gerais, Belo Horizonte, Brazil., Manenti ERF; Hospital Mãe de Deus, Porto Alegre, Brazil., Finger RG; Hospital Regional do Oeste, Chapecó, Brazil., Ponce D; Faculdade de Medicina de Botucatu, Universidade Estadual Paulista Júlio de Mesquita Filho, Botucatu, Brazil., de Aguiar FC; Hospital das Clínicas, Universidade Federal de Pernambuco, Recife, Brazil., Marques LM; Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil., de Castro LC; Hospital Bruno Born, Lajeado, Brazil., Vietta GG; Hospital SOS Cárdio, Florianópolis, Brazil., Godoy MF; Hospital Santo Antônio, Curvelo, Brazil., Vilaça MDN; Hospital Metropolitano Odilon Behrens, Belo Horizonte, Brazil., Morais VC; Faculdade Ciências Médicas de Minas Gerais, Belo Horizonte, Brazil.
Jazyk: angličtina
Zdroj: JMIR medical informatics [JMIR Med Inform] 2024 Oct 28; Vol. 12, pp. e54246. Date of Electronic Publication: 2024 Oct 28.
DOI: 10.2196/54246
Abstrakt: Background: Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes.
Objective: This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data.
Methods: The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets.
Results: Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period.
Conclusions: We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.
(©Bruno Paiva, Marcos André Gonçalves, Leonardo Chaves Dutra da Rocha, Milena Soriano Marcolino, Fernanda Cristina Barbosa Lana, Maira Viana Rego Souza-Silva, Jussara M Almeida, Polianna Delfino Pereira, Claudio Moisés Valiense de Andrade, Angélica Gomides dos Reis Gomes, Maria Angélica Pires Ferreira, Frederico Bartolazzi, Manuela Furtado Sacioto, Ana Paula Boscato, Milton Henriques Guimarães-Júnior, Priscilla Pereira dos Reis, Felício Roberto Costa, Alzira de Oliveira Jorge, Laryssa Reis Coelho, Marcelo Carneiro, Thaís Lorenna Souza Sales, Silvia Ferreira Araújo, Daniel Vitório Silveira, Karen Brasil Ruschel, Fernanda Caldeira Veloso Santos, Evelin Paola de Almeida Cenci, Luanna Silva Monteiro Menezes, Fernando Anschau, Maria Aparecida Camargos Bicalho, Euler Roberto Fernandes Manenti, Renan Goulart Finger, Daniela Ponce, Filipe Carrilho de Aguiar, Luiza Margoto Marques, Luís César de Castro, Giovanna Grünewald Vietta, Mariana Frizzo de Godoy, Mariana do Nascimento Vilaça, Vivian Costa Morais. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 28.10.2024.)
Databáze: MEDLINE