Zobrazeno 1 - 10
of 114
pro vyhledávání: '"Marivate, Vukosi"'
Autor:
O'Neill, Jacki, Marivate, Vukosi, Glover, Barbara, Karanu, Winnie, Tadesse, Girmaw Abebe, Gyekye, Akua, Makena, Anne, Rosslyn-Smith, Wesley, Grollnek, Matthew, Wayua, Charity, Baguma, Rehema, Maduke, Angel, Spencer, Sarah, Kandie, Daniel, Maari, Dennis Ndege, Mutangana, Natasha, Axmed, Maxamed, Kamau, Nyambura, Adamu, Muhammad, Swaniker, Frank, Gatuguti, Brian, Donner, Jonathan, Graham, Mark, Mumo, Janet, Mbindyo, Caroline, N'Guessan, Charlette, Githinji, Irene, Makhafola, Lesego, Kruger, Sean, Etyang, Olivia, Onando, Mulang, Sevilla, Joe, Sambuli, Nanjira, Mbaya, Martin, Breloff, Paul, Anapey, Gideon M., Mogaleemang, Tebogo L., Nghonyama, Tiyani, Wanyoike, Muthoni, Mbuli, Bhekani, Nderu, Lawrence, Nyabero, Wambui, Alam, Uzma, Olaleye, Kayode, Njenga, Caroline, Sellen, Abigail, Kairo, David, Chabikwa, Rutendo, Abdulhamid, Najeeb G., Kubasu, Ketry, Okolo, Chinasa T., Akpo, Eugenia, Budu, Joel, Karambal, Issa, Berkoh, Joseph, Wasswa, William, Njagwi, Muchai, Burnet, Rob, Ochanda, Loise, de Bod, Hanlie, Ankrah, Elizabeth, Kinyunyu, Selemani, Kariuki, Mutembei, Kiyimba, Kizito, Eleshin, Farida, Madeje, Lillian Secelela, Muraga, Catherine, Nganga, Ida, Gichoya, Judy, Maina, Tabbz, Maina, Samuel, Mercy, Muchai, Ochieng, Millicent, Nyairo, Stephanie
This white paper is the output of a multidisciplinary workshop in Nairobi (Nov 2023). Led by a cross-organisational team including Microsoft Research, NEPAD, Lelapa AI, and University of Oxford. The workshop brought together diverse thought-leaders f
Externí odkaz:
http://arxiv.org/abs/2411.10091
Autor:
Sindane, Thapelo, Marivate, Vukosi
In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual models for Language Identification (LID) across 11 South African languages. For N-gram models, this study shows that effective data size selection remains cruci
Externí odkaz:
http://arxiv.org/abs/2410.08728
Large multilingual models have significantly advanced natural language processing (NLP) research. However, their high resource demands and potential biases from diverse data sources have raised concerns about their effectiveness across low-resource l
Externí odkaz:
http://arxiv.org/abs/2409.10965
Autor:
Abdulmumin, Idris, Mkhwanazi, Sthembiso, Mbooi, Mahlatse S., Muhammad, Shamsuddeen Hassan, Ahmad, Ibrahim Said, Putini, Neo, Mathebula, Miehleketo, Shingange, Matimba, Gwadabe, Tajuddeen, Marivate, Vukosi
This paper describes the corrections made to the FLORES evaluation (dev and devtest) dataset for four African languages, namely Hausa, Northern Sotho (Sepedi), Xitsonga, and isiZulu. The original dataset, though groundbreaking in its coverage of low-
Externí odkaz:
http://arxiv.org/abs/2409.00626
Autor:
Tonja, Atnafu Lambebo, Dossou, Bonaventure F. P., Ojo, Jessica, Rajab, Jenalea, Thior, Fadel, Wairagala, Eric Peter, Aremu, Anuoluwapo, Moiloa, Pelonomi, Abbott, Jade, Marivate, Vukosi, Rosman, Benjamin
High-resource language models often fall short in the African context, where there is a critical need for models that are efficient, accessible, and locally relevant, even amidst significant computing and data constraints. This paper introduces Inkub
Externí odkaz:
http://arxiv.org/abs/2408.17024
Autor:
Brown, Nathan, Marivate, Vukosi
In this work we present BOTS-LM, a series of bilingual language models proficient in both Setswana and English. Leveraging recent advancements in data availability and efficient fine-tuning, BOTS-LM achieves performance similar to models significantl
Externí odkaz:
http://arxiv.org/abs/2408.02239
Many multilingual communities, including numerous in Africa, frequently engage in code-switching during conversations. This behaviour stresses the need for natural language processing technologies adept at processing code-switched text. However, data
Externí odkaz:
http://arxiv.org/abs/2404.17216
With the constant spread of misinformation on social media networks, a need has arisen to continuously assess the veracity of digital content. This need has inspired numerous research efforts on the development of misinformation detection (MD) models
Externí odkaz:
http://arxiv.org/abs/2312.04052
Autor:
Marivate, Vukosi, Mots'Oehli, Moseli, Wagner, Valencia, Lastrucci, Richard, Dzingirai, Isheanesu
Natural language processing (NLP) has made significant progress for well-resourced languages such as English but lagged behind for low-resource languages like Setswana. This paper addresses this gap by presenting PuoBERTa, a customised masked languag
Externí odkaz:
http://arxiv.org/abs/2310.09141
Large Language Models, such as Generative Pre-trained Transformer 3 (aka. GPT-3), have been developed to understand language through the analysis of extensive text data, allowing them to identify patterns and connections between words. While LLMs hav
Externí odkaz:
http://arxiv.org/abs/2310.00272