Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages

Autor:	Martelli, F., Navigli, R., Krek, S., Kallas, J., Gantar, P., Koeva, S., Nimb, S., Pedersen, B. S., Olsen, S., Langemets, M., Koppel, K., Üksik, T., Dobrovoljc, K., Ureña-Ruiz, R. J., Sancho-Sánchez, J. -L, Lipp, V., Váradi, T., Győrffy, A., László, S., Quochi, V., Monachini, M., Frontini, F., Tiberius, C., Tempelaars, R., Rute Costa, Salgado, A., Čibej, J., Munda, T.
Rok vydání:	2021
Předmět:	Computational Linguistics Digital lexicography Natural Language Processing Computational Linguistics Corpus Linguistics Word Sense Disambiguation Digital lexicography strategies tools standards for lexicographic resources (objective 3) WP3 Corpus Linguistics Word Sense Disambiguation Natural Language Processing
Zdroj:	CIÊNCIAVITAE Scopus-Elsevier Martelli, F, Navigli, R, Krek, S, Tiberius, C, Kallas, J, Gantar, P, Koeva, S, Nimb, S, Pedersen, B S, Olsen, S, Langements, M, Koppel, K, Üksik, T, Dobrovolijc, K, Ureña-Ruiz, R-J, Sancho-Sánchez, J-L, Lipp, V, Varadi, T, Györffy, A, László, S, Quochi, V, Monachini, M, Frontini, F, Tempelaars, R, Costa, R, Salgado, A, Čibej, J & Munda, T 2021, Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages . in eLex 2021 Proceedings : Proceedings of the eLex 2021 conference . Lexical Computing CZ, Brno, eLex Conference. Proceedings, eLex 2021, 05/07/2021 .
DOI:	10.5281/zenodo.6625399
Popis:	Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on the annotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets, we will annotate semantically by using senses derived from high-quality lexicographic repositories.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::035bfd6f7474faab0a1228ab166fa19b Zobrazit plný text záznamu