Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition

Autor: Qiu, Jielin, Han, William, Wang, Winfred, Yang, Zhengyuan, Li, Linjie, Wang, Jianfeng, Faloutsos, Christos, Li, Lei, Wang, Lijuan
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Open-domain real-world entity recognition is essential yet challenging, involving identifying various entities in diverse environments. The lack of a suitable evaluation dataset has been a major obstacle in this field due to the vast number of entities and the extensive human effort required for data curation. We introduce Entity6K, a comprehensive dataset for real-world entity recognition, featuring 5,700 entities across 26 categories, each supported by 5 human-verified images with annotations. Entity6K offers a diverse range of entity names and categorizations, addressing a gap in existing datasets. We conducted benchmarks with existing models on tasks like image captioning, object detection, zero-shot classification, and dense captioning to demonstrate Entity6K's effectiveness in evaluating models' entity recognition capabilities. We believe Entity6K will be a valuable resource for advancing accurate entity recognition in open-domain settings.
Databáze: arXiv