Popis: |
The increasing interest from research agencies, governments, and universities in understanding research funding and prioritising research efforts has highlighted the need for reliable and efficient methods for exploring research portfolios. In biomedical research, this involves exploring research across what is normally considered fundamental and applied research. As research done in these different categories does not have the same behaviour, such as time to impact or citation behaviour, it is often important to address them separately. Moreover, research is increasingly complex, interdisciplinary and transversal, and increasingly of translational nature. Currently, there are no available tools, as far as we know, that do this. Scientific publications offer a valuable source of information for this purpose, but the growth in the number of biomedical publications makes manual inspection and classification of papers unfeasible. To address this challenge, we present BATRACIO, a new task that aims to classify biomedical publications into the following research types: Basic, Translational, Clinical, and Public Health. We develop and release an expert annotated dataset for the task and evaluate state-of-the-art models to determine the effectiveness of domain-specific pre-trained language models in comparison to general pre-trained language models. We also investigate methods for handling imbalanced datasets in the biomedical domain with adjacent categories. Our results demonstrate that domain-specific pre-trained language models can effectively classify scientific papers based on the research type, overcoming challenges such as the use of abbreviations and acronyms. These findings have important implications for policymakers and funding agencies in understanding research activities and allocating resources effectively. |