Výsledky vyhledávání

Report

Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

Autor: Roark, Brian, Wolf-Sonkin, Lawrence, Kirov, Christo, Mielke, Sabrina J., Johny, Cibu, Demirsahin, Isin, Hall, Keith

This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3

Externí odkaz: http://arxiv.org/abs/2007.01176

Zobrazit plný text záznamu

Report

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Autor: Williams, Adina, Cotterell, Ryan, Wolf-Sonkin, Lawrence, Blasi, Damián, Wallach, Hanna

We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nou

Externí odkaz: http://arxiv.org/abs/2005.01204

Zobrazit plný text záznamu

Report

Quantifying the Semantic Core of Gender Systems

Autor: Williams, Adina, Cotterell, Ryan, Wolf-Sonkin, Lawrence, Blasi, Damián, Wallach, Hanna

Many of the world's languages employ grammatical gender on the lexeme. For example, in Spanish, the word for 'house' (casa) is feminine, whereas the word for 'paper' (papel) is masculine. To a speaker of a genderless language, this assignment seems t

Externí odkaz: http://arxiv.org/abs/1910.13497

Zobrazit plný text záznamu

Report

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Autor: McCarthy, Arya D., Vylomova, Ekaterina, Wu, Shijie, Malaviya, Chaitanya, Wolf-Sonkin, Lawrence, Nicolai, Garrett, Kirov, Christo, Silfverberg, Miikka, Mielke, Sabrina J., Heinz, Jeffrey, Cotterell, Ryan, Hulden, Mans

Publikováno v: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology (2019) 229-244

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.

Externí odkaz: http://arxiv.org/abs/1910.11493

Zobrazit plný text záznamu

Report

Unsupervised Discovery of Gendered Language through Latent-Variable Modeling

Autor: Hoyle, Alexander, Wolf-Sonkin, Wallach, Hanna, Augenstein, Isabelle, Cotterell, Ryan

Studying the ways in which language is gendered has long been an area of interest in sociolinguistics. Studies have explored, for example, the speech of male and female characters in film and the language used to describe male and female politicians.

Externí odkaz: http://arxiv.org/abs/1906.04760

Zobrazit plný text záznamu

Report

Combining Sentiment Lexica with a Multi-View Variational Autoencoder

Autor: Hoyle, Alexander, Wolf-Sonkin, Lawrence, Wallach, Hanna, Cotterell, Ryan, Augenstein, Isabelle

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally,

Externí odkaz: http://arxiv.org/abs/1904.02839

Zobrazit plný text záznamu

Report

A Structured Variational Autoencoder for Contextual Morphological Inflection

Autor: Wolf-Sonkin, Lawrence, Naradowsky, Jason, Mielke, Sabrina J., Cotterell, Ryan

Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we

Externí odkaz: http://arxiv.org/abs/1806.03746

Zobrazit plný text záznamu

Akademický článek

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Autor: Adina Williams, Ryan Cotterell, Lawrence Wolf-Sonkin, Damián Blasi, Hanna Wallach

Publikováno v: Transactions of the Association for Computational Linguistics, Vol 9, Pp 139-159 (2021)

AbstractWe use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe t

Externí odkaz: https://doaj.org/article/06a8b1b634e740f8ad3dafc3ecff30c8

Zobrazit plný text záznamu

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Autor: Damián E. Blasi, Hanna Wallach, Lawrence Wolf-Sonkin, Adina Williams, Ryan Cotterell

Publikováno v: Transactions of the Association for Computational Linguistics. 9:139-159

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::098001c695f90c32e6e66f0caa68c840
https://doi.org/10.1162/tacl_a_00355

Zobrazit plný text záznamu

Finite-state script normalization and processing utilities: The Nisaba Brahmic library

Autor: Lawrence Wolf-Sonkin, Alexander Gutkin, Brian Roark, Cibu Johny

Publikováno v: EACL (System Demonstrations)

This paper presents an open-source library for efficient low-level processing of ten major South Asian Brahmic scripts. The library provides a flexible and extensible framework for supporting crucial operations on Brahmic scripts, such as NFC, visual

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::37da5a5d65f3fd39fe61dcbdb63659e6
https://doi.org/10.18653/v1/2021.eacl-demos.3

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání