Robust named entity detection in videotext using character lattices

Autor: Rohit Prasad, Pradeep Natarajan, Ehry MacRostie, Krishna Subramanian
Rok vydání: 2008
Předmět:
Zdroj: ICASSP
ISSN: 1520-6149
DOI: 10.1109/icassp.2008.4517841
Popis: Text in video sequences can provide key indexing information. In particular, videotext is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in OCR output look for these NEs in the single-best recognition results. Due to inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from character lattices produced by our videotext OCR system. Furthermore, we use an approximate match criterion that allows insertion of punctuations during lookup. Experimental results show a 50% relative improvement in NE recall using lattices over exact lookup in the 1-best hypothesis. Since the improvement in recall is accompanied by a large number of false positives, we present techniques for reducing false alarms. In addition, we describe efficient techniques for reducing the time for detecting NEs.
Databáze: OpenAIRE