SuMe: A Dataset Towards Summarizing Biomedical Mechanisms

Autor:	Bastan, Mohaddeseh, Shankar, N., Surdeanu, Mihai, Balasubramanian, Niranjan, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Odijk, Jan, Piperidis, Stelios
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Biomedical NLP Summarization Text Generation Explanation Generation Relation Extraction
Zdroj:	2022 Language Resources and Evaluation Conference, LREC 2022
Popis:	Can language models read biomedical texts and explain the biomedical mechanisms discussed? In this work we introduce a biomedical mechanism summarization task. Biomedical studies often investigate the mechanisms behind how one entity (e.g., a protein or a chemical) affects another in a biological context. The abstracts of these publications often include a focused set of sentences that present relevant supporting statements regarding such relationships, associated experimental evidence, and a concluding sentence that summarizes the mechanism underlying the relationship. We leverage this structure and create a summarization task, where the input is a collection of sentences and the main entities in an abstract, and the output includes the relationship and a sentence that summarizes the mechanism. Using a small amount of manually labeled mechanism sentences, we train a mechanism sentence classifier to filter a large biomedical abstract collection and create a summarization dataset with 22k instances. We also introduce conclusion sentence generation as a pretraining task with 611k instances. We benchmark the performance of large bio-domain language models. We find that while the pretraining task help improves performance, the best model produces acceptable mechanism outputs in only 32% of the instances, which shows the task presents significant challenges in biomedical language understanding and summarization.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=narcis______::e5265cbcd461a34e6693eed49cef2809 http://resolver.tudelft.nl/uuid:fdf6a58a-6d0c-4a2a-8e3a-32aeccefe597 Zobrazit plný text záznamu