Automatic Detection and Extraction of Key Resources from Tables in Biomedical Papers.
Autor: | Ozyurt IB; FDI Lab, University of California, San Diego, 9500 Gilman Drive, M/C 0608, La Jolla, CA 92093-0608, USA., Bandrowski A; FDI Lab, University of California, San Diego, 9500 Gilman Drive, M/C 0608, La Jolla, CA 92093-0608, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | BioRxiv : the preprint server for biology [bioRxiv] 2024 Oct 17. Date of Electronic Publication: 2024 Oct 17. |
DOI: | 10.1101/2024.10.15.618379 |
Abstrakt: | Tables are useful information artifacts that allow easy detection of data "missingness" by humans and have been deployed by several publishers to improve the amount of information present for key resources and reagents such as antibodies, cell lines, and other tools that constitute the inputs to a study. The STAR*Methods tables, specifically, have increased the "findability" of these key resources, but they have not been commonly available outside of the Cell Press journal family. To improve the availability of these tables in the broader biomedical literature, we have attempted to automatically process BioRxiv preprints to create tables from text or to recognize tables already created by authors and structure them for later use by publishers and search systems, to improve "findability" of resources in a larger amount of the scientific literature. The extraction of key resource tables in PDF files by the best in class tools resulted in Grid Table Similarity (GriTS) score of 0.12, so we have created several multimodal pipelines employing machine learning approaches for key resource table page identification, Table Transformer models for table detection and table structure recognition and a new table-specific language model for row over-segmentation to improve the extraction of text in tables created by biomedical authors and published on BioRxiv to around GriTS score of 0.90 enabling the deployment of automated research resource extraction tools onto BioRxiv. Competing Interests: Conflict of interest statement AB and IBO are a co-founders and members of the board of SciCrunch Inc, a company that works with publishers to improve the representation of research resources in scientific literature. AB serves as the CEO. This relationship has been reviewed and approved by the UCSD Conflict of Interest committee. |
Databáze: | MEDLINE |
Externí odkaz: |