Enhancing Chinese abbreviation prediction with LLM generation and contrastive evaluation.

Autor: Liu, Jingping1 (AUTHOR) Jingpingliu@ecust.edu.cn, Tian, Xianyang1 (AUTHOR), Tong, Hanwen2 (AUTHOR), Xie, Chenhao2 (AUTHOR), Ruan, Tong1 (AUTHOR) ruantong@ecust.edu.cn, Cong, Lin3 (AUTHOR), Wu, Baohua3 (AUTHOR), Wang, Haofen4 (AUTHOR)
Zdroj: Information Processing & Management. Jul2024, Vol. 61 Issue 4, pN.PAG-N.PAG. 1p.
Abstrakt: Chinese abbreviation prediction plays an important role in natural language processing. The prevalent approach often utilizes generation models to predict abbreviations for full forms, but relying solely on a single generation model may not yield high-quality abbreviations. We emphasize the importance of introducing an evaluation model after the generation model to assess the rationality of generated abbreviations. Hence, in this paper, we propose a novel two-stage method with LLM generation and contrastive evaluation for Chinese abbreviation prediction. In the first stage, we design a type discriminator to determine the abbreviation type and then introduce a pre-trained and fine-tuned LLM to generate multiple candidate abbreviations. In the second stage, we propose a contrastive evaluation model to assess the rationality of the candidates based on the abbreviation scorer and phrase scorer with a joint learning strategy. Experiments on two public datasets indicate that our method outperforms the current state-of-the-art method, achieving improvements of 3.32% and 1.73%, respectively. More importantly, we deploy it on the Fliggy application and the 20-day online A/B testing shows a 0.65% increase in Point of Interest Recognition Rate and a 1.37% increase in Page View Click-Through Rate when using abbreviations predicted by our method in the search system. • A novel two-stage method with LLM generation and contrastive evaluation for Chinese abbreviation prediction. • Outperforms SoTA by 3.32% and 1.73% on Hit@1. • The online A/B testing on Fliggy APP indicates that POI NER and PV-CTR increase by 0.65% and 1.37%. [ABSTRACT FROM AUTHOR]
Databáze: Library, Information Science & Technology Abstracts