Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Tanja Gaustad"'
Publikováno v:
Data in Brief, Vol 57, Iss , Pp 110898- (2024)
This data article describes a machine translation training data set for translation between English and Tshivenḓa. The data set contains parallel, aligned English–Tshivenḓa data as well as monolingual Tshivenḓa data. The data was collected fr
Externí odkaz:
https://doaj.org/article/98bdb0fb8eec491e98cebe20362759d1
Publikováno v:
Data in Brief, Vol 54, Iss , Pp 110325- (2024)
This data article presents a dataset for Siswati, a Bantu language of the Nguni group that is one of the eleven official South African languages and the official language of Eswatini (together with English). The dataset contains parallel textual data
Externí odkaz:
https://doaj.org/article/c96fa087c91740859792dba75c8220f0
Autor:
Tanja Gaustad, Cindy A. McKellar
Publikováno v:
Journal of Open Humanities Data, Vol 10, Pp 38-38 (2024)
The dataset described in this article presents converted and updated corpora for nine of the twelve official South African languages. After a revision of the morphological annotation protocols, the existing National Centre for Human Language Technolo
Externí odkaz:
https://doaj.org/article/7c431654a12940f19882142cc8449f57
Autor:
Tanja Gaustad, Martin J. Puttkammer
Publikováno v:
Data in Brief, Vol 41, Iss , Pp 107994- (2022)
This data article presents a linguistically annotated data set for four official South African languages with a conjunctive orthography, namely isiNdebele, isiXhosa, isiZulu and Siswati. The data set is parallel for all four languages and can be used
Externí odkaz:
https://doaj.org/article/9e281df15bde4d74ac37364f1a173e5c
Autor:
Tanja Gaustad, Roald Eiselen
Publikováno v:
Journal of the Digital Humanities Association of Southern Africa (DHASA). 4
This paper presents an exploration of word embeddings for Afrikaans using the analogies and nearest neighbours methodologies. We compare the results on three types of embeddings (fastText, FLAIR and GloVe) on a novel analogy data set for Afrikaans, i
Development of linguistically annotated parallel language resources for four South African languages
Autor:
Tanja Gaustad, Martin Puttkammer
Publikováno v:
Journal of the Digital Humanities Association of Southern Africa (DHASA). 3
For this project, we collected and annotated data to develop language resources for the four official South African Nguni languages written with a conjunctive orthography. The data for these four languages is parallel to allow for comparative (comput
Computational Linguistics in the Netherlands 2002 : Selected Papers From the Thirteenth CLIN Meeting
Autor:
Tanja Gaustad
This volume provides a selection of the papers which were presented at the thirteenth conference on Computational Linguistics in the Netherlands (held in Groningen in November 2002). The subjects covered in this book represent a cross-section of curr
Autor:
Menno van Zaanen, Tanja Gaustad
Publikováno v:
Grammatical Inference: Theoretical Results and Applications ISBN: 9783642154874
ICGI
ICGI
Grammatical inference is typically defined as the task of finding a compact representation of a language given a subset of sample sequences from that language. Many different aspects, paradigms and settings can be investigated, leading to different p
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::4235fb861eed0e1c0b7ba8fe10f2e369
https://doi.org/10.1007/978-3-642-15488-1_20
https://doi.org/10.1007/978-3-642-15488-1_20
Autor:
Tanja Gaustad
Publikováno v:
COLING
Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004), 778-784
STARTPAGE=778;ENDPAGE=784;TITLE=Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004)
Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004), 778-784
STARTPAGE=778;ENDPAGE=784;TITLE=Proceedings of the 20th International Conference on Computational Linguistics (Coling 2004)
In this paper, we present a corpus-based supervised word sense disambiguation (WSD) system for Dutch which combines statistical classification (maximum entropy) with linguistic information. Instead of building individual classifiers per ambiguous wor