Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Chenglei Si"'
Autor:
Chenglei Si, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 11, Pp 469-487 (2023)
AbstractTokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token. However, they ignore the unique feature of the Chinese writing system whe
Externí odkaz:
https://doaj.org/article/5e1c7178b0ed4851a6922b1218a03d89
Publikováno v:
ACL/IJCNLP (Findings)
Machine Reading Comprehension (MRC) is an important testbed for evaluating models' natural language understanding (NLU) ability. There has been rapid progress in this area, with new models achieving impressive performance on various benchmarks. Howev
Publikováno v:
ACL/IJCNLP (Findings)
Publikováno v:
SEM
Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this pap
Publikováno v:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units an
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ae49907ae6631806a28cd1e03f0a7937
Publikováno v:
WAT@EMNLP-IJCNLP
Sentiment ambiguous lexicons refer to words where their polarity depends strongly on con- text. As such, when the context is absent, their translations or their embedded sentence ends up (incorrectly) being dependent on the training data. While neura
Publikováno v:
Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications.
Datasets are integral artifacts of empirical scientific research. However, due to natural language variation, their recognition can be difficult and even when identified, can often be inconsistently referred across and within publications. We report