Autor: |
Knight, Dawn, Morris, Steve, Fitzpatrick, Tess, Rayson, Paul, Spasić, Irena, Thomas, Enlli Môn, Lovell, Alex, Morris, Jonathan, Evas, Jeremy, Stonelake, Mark, Arman, Laura, Davies, Joshua, Ezeani, Ignatius, Neale, Steven, Needs, Jennifer, Piao, Scott, Rees, Mair, Watkins, Gareth, Williams, Lowri, Muralidaran, Vignesh, Tovey-Walsh, Bethan, Anthony, Laurence, Cobb, Tom, Deuchar, Margaret, Donnelly, Kevin, McCarthy, Michael, Scannell, Kevin |
Rok vydání: |
2021 |
DOI: |
10.5255/ukda-sn-854531 |
Popis: |
The CorCenCC corpus contains over 11 million words (circa 14.4m tokens). CorCenCC is the first corpus of the Welsh language that covers all three aspects of contemporary Welsh: spoken, written and electronically mediated (e-language). It offers a snapshot of the Welsh language across a range of contexts of use, e.g. private conversations, group socialising, business and other work situations, in education, in the various published media, and in public spaces. It includes examples of news headlines, personal and professional emails and correspondence, academic writing, formal and informal speech, blog posts and text messaging. Language data was sampled from a range of different speakers and users of Welsh, from all regions of Wales, of all ages and genders, with a wide range of occupations, and with a variety of linguistic backgrounds (e.g. how they came to speak Welsh), to reflect the diversity of text types and of Welsh speakers found in contemporary Wales. In this way, the CorCenCC corpus provides the means for empowering users of Welsh to better understand and observe the language across diverse settings, and creates a solid evidence base for the teaching of contemporary Welsh to those who aspire to use it. Over time, the corpus has the potential to make a significant contribution to the transformation of Welsh as the language of public, commercial, education and governmental discourse. A beta version of some bilingual corpus query tools have also been created as part of the CorCenCC project (see Related Resources). These include simple query, full query, frequency list, n-gram, keyword and collocation functionalities. The CorCenCC website also contains Y Tiwtiadur, a collection of data-driven teaching and learning tools designed to help supplement Welsh language learning at all different ages and levels. Y Tiwtiadur contains four distinct corpus-based exercises: Gap Filling (Cloze), Vocabulary Profiler, Word Identification and Word-in-Context. To access this tool, see Related Resources. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|