Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian

Autor: Sara Castagnoli, M. Silvia Micheli, Malvina Nissim, Francesca Masini, Andrea Zaninello
Přispěvatelé: J. Monti, F. Dell'Orletta, F. Tamburini, Francesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissim, Monti, J., Dell'Orletta, F., Tamburini, F., Masini, F, Micheli, M, Zaninello, A, Castagnoli, S, Nissim, M
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Distribution (number theory)
Italian
multiword expressions
corpora
Italian
Natural Language Processing

Computer science
Multiword expression
AriEmozione
corpora
computer.software_genre
Settore L-LIN/01 - Glottologia e Linguistica
Online Hate Speech
Resource (project management)
CBX
Multilingual NLU
Twitter during Pandemic
Lemma (mathematics)
Automatic Sarcasm Detection
Linguistic Ostracism in Social Networks
business.industry
COVID-19
multiword expressions
MWE dataset
computational linguistics
corpus linguistics
Italian MWE

Linguistics
LAN000000
Quantitative Linguistic Investigations
Fine-grained sentiment analysis
Computational Linguistics
DistilBERT
Depression from Social Media
Distributional Semantics
Gender Bias
AEREST
E3C Project
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
TrAVaSI
Artificial intelligence
business
computer
Natural language processing
L-LIN/01 - GLOTTOLOGIA E LINGUISTICA
Zdroj: CLiC-it
Popis: The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
Databáze: OpenAIRE