Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Jeon, Taehee"'
We introduce a morpheme-aware subword tokenization method that utilizes sub-character decomposition to address the challenges of applying Byte Pair Encoding (BPE) to Korean, a language characterized by its rich morphology and unique writing system. O
Externí odkaz:
http://arxiv.org/abs/2311.03928