Pragmatic Quality Assessment for Automatically Extracted Data
Autor: | Christopher Almquist, Tae Woo Kim, David W. Embley, Stephen W. Liddle, Scott N. Woodfield, Deryle Lonsdale |
---|---|
Rok vydání: | 2016 |
Předmět: |
Computer science
Quality assessment 02 engineering and technology computer.software_genre Conjunction (grammar) Constraint (information theory) 020204 information systems Data quality 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Precision and recall computer Error identification |
Zdroj: | Conceptual Modeling ISBN: 9783319463964 ER |
DOI: | 10.1007/978-3-319-46397-1_16 |
Popis: | Automatically extracted data is rarely “clean” with respect to pragmatic (real-world) constraints—which thus hinders applications that depend on quality data. We proffer a solution to detecting pragmatic constraint violations that works via a declarative and semantically enabled constraint-violation checker. In conjunction with an ensemble of automated information extractors, the implemented prototype checks both hard and soft constraints—respectively those that are satisfied or not and those that are satisfied probabilistically with respect to a threshold. An experimental evaluation shows that the constraint checker identifies semantic errors with high precision and recall and that pragmatic error identification can improve results. |
Databáze: | OpenAIRE |
Externí odkaz: |