Identifying Ambiguity in Semantic Resources
Autor: | Anna Lisa Gentile, Daniel Gruhl, Anni Coden, Steve Welch |
---|---|
Rok vydání: | 2019 |
Předmět: |
0303 health sciences
Computer science business.industry media_common.quotation_subject Mistake 02 engineering and technology Ambiguity computer.software_genre Term (time) 03 medical and health sciences Task (computing) Information extraction Resource (project management) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Spurious relationship business Semantic Web computer Natural language processing 030304 developmental biology media_common |
Zdroj: | K-CAP |
DOI: | 10.1145/3360901.3364412 |
Popis: | In many Information Extraction tasks, dictionaries and lexica are powerful building blocks for sophisticated extractions. The success of the Semantic Web in the last 10 years has produced an unprecedented quantity of available structured data that can be leveraged to produce dictionaries on countless concepts in many domains. While being an invaluable resource, these automatically built dictionaries may contain "problematic" items, such as spurious words, which have been included by mistake, or ambiguous words, which appear with multiple different meanings in the target corpus and therefore necessitating an expensive disambiguation task. In this paper, we propose a simple and effective method to identify problematic terms in a given dictionary, which are ambiguous or spurious with respect to a given corpus, with the aim to facilitate subsequent Information Extraction tasks. We prove the effectiveness of the method with a systematic experiment on publicly available concept dictionaries, using a very large Web corpus as target, with an average precision in identifying a problem term above 85%. |
Databáze: | OpenAIRE |
Externí odkaz: |