Review of coreference resolution in English and Persian

Autor: Mohammadi, Hassan Haji, Talebpour, Alireza, Aznaveh, Ahmad Mahmoudi, Yazdani, Samaneh
Rok vydání: 2022
Předmět:
Druh dokumentu: Working Paper
Popis: Coreference resolution (CR), identifying expressions referring to the same real-world entity, is a fundamental challenge in natural language processing (NLP). This paper explores the latest advancements in CR, spanning coreference and anaphora resolution. We critically analyze the diverse corpora that have fueled CR research, highlighting their strengths, limitations, and suitability for various tasks. We examine the spectrum of evaluation metrics used to assess CR systems, emphasizing their advantages, disadvantages, and the need for more nuanced, task-specific metrics. Tracing the evolution of CR algorithms, we provide a detailed overview of methodologies, from rule-based approaches to cutting-edge deep learning architectures. We delve into mention-pair, entity-based, cluster-ranking, sequence-to-sequence, and graph neural network models, elucidating their theoretical foundations and performance on benchmark datasets. Recognizing the unique challenges of Persian CR, we dedicate a focused analysis to this under-resourced language. We examine existing Persian CR systems and highlight the emergence of end-to-end neural models leveraging pre-trained language models like ParsBERT. This review is an essential resource for researchers and practitioners, offering a comprehensive overview of the current state-of-the-art in CR, identifying key challenges, and charting a course for future research in this rapidly evolving field.
Comment: 44 pages, 8 figures, 4 tables
Databáze: arXiv