Výsledky vyhledávání

Report

Publikováno v: Sarveswaran, K. (2024). Building Tamil Treebanks. In Proceedings of the International Conference on Tamil Computing and Information Technology (ICTCIT 2024)/23rd Tamil Internet Conference (pp. 22-32). INFITT. ISSN: 2313-4887

Treebanks are important linguistic resources, which are structured and annotated corpora with rich linguistic annotations. These resources are used in Natural Language Processing (NLP) applications, supporting linguistic analyses, and are essential f

Externí odkaz: http://arxiv.org/abs/2409.14657

Zobrazit plný text záznamu

Elektronická kniha

Treebanks : Building and Using Parsed Corpora

Autor: A. Abeillé

Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer

Zobrazit plný text záznamu

Report

Latin Treebanks in Review: An Evaluation of Morphological Tagging Across Time

Autor: Hudspeth, Marisa, O'Connor, Brendan, Thompson, Laure

Existing Latin treebanks draw from Latin's long written tradition, spanning 17 centuries and a variety of cultures. Recent efforts have begun to harmonize these treebanks' annotations to better train and evaluate morphological taggers. However, the h

Externí odkaz: http://arxiv.org/abs/2408.06675

Zobrazit plný text záznamu

Report

Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks

Autor: Herrera, Santiago, Corro, Caio, Kahane, Sylvain

Descriptive grammars are highly valuable, but writing them is time-consuming and difficult. Furthermore, while linguists typically use corpora to create them, grammar descriptions often lack quantitative data. As for formal grammars, they can be chal

Externí odkaz: http://arxiv.org/abs/2403.17534

Zobrazit plný text záznamu

Elektronická kniha

Diachronic Treebanks for Historical Linguistics

Autor: Hanne Martine Eckhoff, Silvia Luraghi, Marco Passarotti

Over the last few decades, the widespread diffusion of digital technology has increased availability of primary textual sources, radically changing the everyday life of scholars in the humanities, who are now able to access, query and process a wealt

Zobrazit plný text záznamu

Report

Multilingual Nonce Dependency Treebanks: Understanding how Language Models represent and process syntactic structure

Autor: Arps, David, Kallmeyer, Laura, Samih, Younes, Sajjad, Hassan

We introduce SPUD (Semantically Perturbed Universal Dependencies), a framework for creating nonce treebanks for the multilingual Universal Dependencies (UD) corpora. SPUD data satisfies syntactic argument structure, provides syntactic annotations, an

Externí odkaz: http://arxiv.org/abs/2311.07497

Zobrazit plný text záznamu

Report

Are UD Treebanks Getting More Consistent? A Report Card for English UD

Autor: Zeldes, Amir, Schneider, Nathan

Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple co

Externí odkaz: http://arxiv.org/abs/2302.00636

Zobrazit plný text záznamu

Report

Large Discourse Treebanks from Scalable Distant Supervision

Autor: Huber, Patrick, Carenini, Giuseppe

Publikováno v: CODI 2020

Discourse parsing is an essential upstream task in Natural Language Processing with strong implications for many real-world applications. Despite its widely recognized role, most recent discourse parsers (and consequently downstream tasks) still rely

Externí odkaz: http://arxiv.org/abs/2212.06038

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání