Audio De-identification: A New Entity Recognition Task
Autor: | Itay Laish, Avinatan Hassidim, Idan Szpektor, Izhak Shafran, Genady Beryozkin, Tzvika Hartman, Yossi Matias, Gang Li, Ido Cohn |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
0303 health sciences Computer Science - Computation and Language business.industry Character (computing) Computer science De-identification Context (language use) computer.software_genre Pipeline (software) Task (project management) 03 medical and health sciences 0302 clinical medicine Named-entity recognition Metric (mathematics) 030212 general & internal medicine Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing 030304 developmental biology |
Zdroj: | NAACL-HLT (2) |
Popis: | Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline's results on it. Accepted to NAACL 2019 Industry Track |
Databáze: | OpenAIRE |
Externí odkaz: |