Finding relevant biomedical datasets: the UC San Diego solution for the bioCADDIE Retrieval Challenge

Autor: Zhanglong Ji, Lucila Ohno-Machado, Yupeng He, Kai Zhang, Qi Li, Wei Wei, Yuanchi Ha
Rok vydání: 2018
Předmět:
Zdroj: Database: The Journal of Biological Databases and Curation
ISSN: 1758-0463
Popis: The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje