Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?
Autor: | Karl Pichotta, Yoav Goldberg, Sarthak Jain, Eric Lehman, Byron C. Wallace |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
0303 health sciences Computer Science - Machine Learning Computer Science - Computation and Language business.industry Computer science Computer Science - Artificial Intelligence 010501 environmental sciences Health records Machine learning computer.software_genre 01 natural sciences Machine Learning (cs.LG) 03 medical and health sciences Information sensitivity Data access Artificial Intelligence (cs.AI) Personal health Artificial intelligence business computer Computation and Language (cs.CL) 030304 developmental biology 0105 earth and related environmental sciences |
Zdroj: | NAACL-HLT |
Popis: | Large Transformers pretrained over clinical notes from Electronic Health Records (EHR) have afforded substantial gains in performance on predictive clinical tasks. The cost of training such models (and the necessity of data access to do so) coupled with their utility motivates parameter sharing, i.e., the release of pretrained models such as ClinicalBERT. While most efforts have used deidentified EHR, many researchers have access to large sets of sensitive, non-deidentified EHR with which they might train a BERT model (or similar). Would it be safe to release the weights of such a model if they did? In this work, we design a battery of approaches intended to recover Personal Health Information (PHI) from a trained BERT. Specifically, we attempt to recover patient names and conditions with which they are associated. We find that simple probing methods are not able to meaningfully extract sensitive information from BERT trained over the MIMIC-III corpus of EHR. However, more sophisticated "attacks" may succeed in doing so: To facilitate such research, we make our experimental setup and baseline probing models available at https://github.com/elehman16/exposing_patient_data_release NAACL Camera Ready Submission |
Databáze: | OpenAIRE |
Externí odkaz: |