Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Kargaran, Amir Hossein"'
Autor:
Kargaran, Amir Hossein, Modarressi, Ali, Nikeghbal, Nafiseh, Diesner, Jana, Yvon, François, Schütze, Hinrich
English-centric large language models (LLMs) often show strong multilingual capabilities. However, the multilingual performance of these models remains unclear and is not thoroughly evaluated for many languages. Most benchmarks for multilinguality fo
Externí odkaz:
http://arxiv.org/abs/2410.05873
Autor:
Liu, Yihong, Wang, Mingyang, Kargaran, Amir Hossein, Imani, Ayyoob, Xhelili, Orgest, Ye, Haotian, Ma, Chunlan, Yvon, François, Schütze, Hinrich
Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives on both original and transliterated data can improve crosslingual alignment. This improvement further leads to better crosslingual
Externí odkaz:
http://arxiv.org/abs/2409.17326
We present MaskLID, a simple, yet effective, code-switching (CS) language identification (LID) method. MaskLID does not require any training and is designed to complement current high-performance sentence-level LIDs. Sentence-level LIDs are classifie
Externí odkaz:
http://arxiv.org/abs/2406.06263
Platforms such as GitHub and GitLab introduce Issue Report Templates (IRTs) to enable more effective issue management and better alignment with developer expectations. However, these templates are not widely adopted in most repositories, and there is
Externí odkaz:
http://arxiv.org/abs/2402.02632
Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages. However, there is no LID available that (i) covers a wide range of low-resource languages, (ii) is rigoro
Externí odkaz:
http://arxiv.org/abs/2310.16248
We present GlotScript, an open resource and tool for low resource writing system identification. GlotScript-R is a resource that provides the attested writing systems for more than 7,000 languages. It is compiled by aggregating information from exist
Externí odkaz:
http://arxiv.org/abs/2309.13320
Autor:
Imani, Ayyoob, Lin, Peiqin, Kargaran, Amir Hossein, Severini, Silvia, Sabet, Masoud Jalili, Kassner, Nora, Ma, Chunlan, Schmid, Helmut, Martins, André F. T., Yvon, François, Schütze, Hinrich
The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i.e., making them better for about 100 languages. We instead scale LLMs horizontally: we create, through continued pretraining, Glot500-m, an LLM that covers 511
Externí odkaz:
http://arxiv.org/abs/2305.12182
GitHub's issue reports provide developers with valuable information that is essential to the evolution of a software development project. Contributors can use these reports to perform software engineering tasks like submitting bugs, requesting featur
Externí odkaz:
http://arxiv.org/abs/2303.09236
Menu system design for user interfaces is a challenging task involving many design options and various human factors. For example, one crucial factor that designers need to consider is the semantic and systematic relation of menu commands. However, c
Externí odkaz:
http://arxiv.org/abs/2303.04496
Autor:
Kargaran, Amir Hossein, Akhondzadeh, Mohammad Sadegh, Heidarpour, Mohammad Reza, Manshaei, Mohammad Hossein, Salamatian, Kave, Sattary, Masoud Nejad
Websites use third-party ads and tracking services to deliver targeted ads and collect information about users that visit them. These services put users' privacy at risk, and that is why users' demand for blocking these services is growing. Most of t
Externí odkaz:
http://arxiv.org/abs/2004.14826