MT4CrossOIE: Multi-stage Tuning for Cross-lingual Open Information Extraction

Autor: Li, Tongliang, Wang, Zixiang, Chai, Linzheng, Yang, Jian, Bai, Jiaqi, Yin, Yuwei, Liu, Jiaheng, Guo, Hongcheng, Yang, Liqun, el-abidine, Hebboul Zine, Li, Zhoujun
Rok vydání: 2023
Předmět:
Druh dokumentu: Working Paper
Popis: Cross-lingual open information extraction aims to extract structured information from raw text across multiple languages. Previous work uses a shared cross-lingual pre-trained model to handle the different languages but underuses the potential of the language-specific representation. In this paper, we propose an effective multi-stage tuning framework called MT4CrossIE, designed for enhancing cross-lingual open information extraction by injecting language-specific knowledge into the shared model. Specifically, the cross-lingual pre-trained model is first tuned in a shared semantic space (e.g., embedding matrix) in the fixed encoder and then other components are optimized in the second stage. After enough training, we freeze the pre-trained model and tune the multiple extra low-rank language-specific modules using mixture-of-LoRAs for model-based cross-lingual transfer. In addition, we leverage two-stage prompting to encourage the large language model (LLM) to annotate the multi-lingual raw data for data-based cross-lingual transfer. The model is trained with multi-lingual objectives on our proposed dataset OpenIE4++ by combing the model-based and data-based transfer techniques. Experimental results on various benchmarks emphasize the importance of aggregating multiple plug-in-and-play language-specific modules and demonstrate the effectiveness of MT4CrossIE in cross-lingual OIE\footnote{\url{https://github.com/CSJianYang/Multilingual-Multimodal-NLP}}.
Comment: 10 pages
Databáze: arXiv