Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Rutherford, Attapol"'
The rapid advancement of large language models (LLMs) has highlighted the need for robust evaluation frameworks that assess their core capabilities, such as reasoning, knowledge, and commonsense, leading to the inception of certain widely-used benchm
Externí odkaz:
http://arxiv.org/abs/2410.04795
Autor:
Laosaengpha, Napat, Tativannarat, Thanit, Piansaddhayanon, Chawan, Rutherford, Attapol, Chuangsuwanich, Ekapol
Learning job title representation is a vital process for developing automatic human resource tools. To do so, existing methods primarily rely on learning the title representation through skills extracted from the job description, neglecting the rich
Externí odkaz:
http://arxiv.org/abs/2406.08055
Autor:
Trakuekul, Pontakorn, Leong, Wei Qi, Polpanumas, Charin, Sawatphol, Jitkapat, Tjhi, William Chandra, Rutherford, Attapol T.
While coreference resolution is a well-established research area in Natural Language Processing (NLP), research focusing on Thai language remains limited due to the lack of large annotated corpora. In this work, we introduce ThaiCoref, a dataset for
Externí odkaz:
http://arxiv.org/abs/2406.06000
Autor:
Sriwirote, Panyut, Leong, Wei Qi, Polpanumas, Charin, Thanyawong, Santhawat, Tjhi, William Chandra, Aroonmanakun, Wirote, Rutherford, Attapol T.
Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, espec
Externí odkaz:
http://arxiv.org/abs/2405.07586
While WangchanBERTa has become the de facto standard in transformer-based Thai language modeling, it still has shortcomings in regard to the understanding of foreign words, most notably English words, which are often borrowed without orthographic ass
Externí odkaz:
http://arxiv.org/abs/2311.12475
Traditionally, Latent Dirichlet Allocation (LDA) ingests words in a collection of documents to discover their latent topics using word-document co-occurrences. However, it is unclear how to achieve the best results for languages without marked word b
Externí odkaz:
http://arxiv.org/abs/2108.10755
The primary objective of our work is to build a large-scale English-Thai dataset for machine translation. We construct an English-Thai machine translation dataset with over 1 million segment pairs, curated from various sources, namely news, Wikipedia
Externí odkaz:
http://arxiv.org/abs/2007.03541
Word segmentation is a fundamental pre-processing step for Thai Natural Language Processing. The current off-the-shelf solutions are not benchmarked consistently, so it is difficult to compare their trade-offs. We conducted a speed and accuracy compa
Externí odkaz:
http://arxiv.org/abs/1911.07056
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Inferring implicit discourse relations in natural language text is the most difficult subtask in discourse parsing. Surface features achieve good performance, but they are not readily applicable to other languages without semantic lexicons. Previous
Externí odkaz:
http://arxiv.org/abs/1606.01990