Zobrazeno 1 - 10
of 36
pro vyhledávání: '"Wolf-Sonkin"'
Autor:
Roark, Brian, Wolf-Sonkin, Lawrence, Kirov, Christo, Mielke, Sabrina J., Johny, Cibu, Demirsahin, Isin, Hall, Keith
This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3
Externí odkaz:
http://arxiv.org/abs/2007.01176
We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nou
Externí odkaz:
http://arxiv.org/abs/2005.01204
Many of the world's languages employ grammatical gender on the lexeme. For example, in Spanish, the word for 'house' (casa) is feminine, whereas the word for 'paper' (papel) is masculine. To a speaker of a genderless language, this assignment seems t
Externí odkaz:
http://arxiv.org/abs/1910.13497
Autor:
McCarthy, Arya D., Vylomova, Ekaterina, Wu, Shijie, Malaviya, Chaitanya, Wolf-Sonkin, Lawrence, Nicolai, Garrett, Kirov, Christo, Silfverberg, Miikka, Mielke, Sabrina J., Heinz, Jeffrey, Cotterell, Ryan, Hulden, Mans
Publikováno v:
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
Externí odkaz:
http://arxiv.org/abs/1910.11493
Studying the ways in which language is gendered has long been an area of interest in sociolinguistics. Studies have explored, for example, the speech of male and female characters in film and the language used to describe male and female politicians.
Externí odkaz:
http://arxiv.org/abs/1906.04760
Autor:
Hoyle, Alexander, Wolf-Sonkin, Lawrence, Wallach, Hanna, Cotterell, Ryan, Augenstein, Isabelle
When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally,
Externí odkaz:
http://arxiv.org/abs/1904.02839
Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we
Externí odkaz:
http://arxiv.org/abs/1806.03746
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 9, Pp 139-159 (2021)
AbstractWe use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe t
Externí odkaz:
https://doaj.org/article/06a8b1b634e740f8ad3dafc3ecff30c8
Publikováno v:
Transactions of the Association for Computational Linguistics. 9:139-159
We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nou
Publikováno v:
EACL (System Demonstrations)
This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual