Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens

Autor: Midusha Shrestha, Truong X. Tran, Ramazan S. Aygun, Bidhan Bhattarai, Marc L. Pusey
Rok vydání: 2020
Předmět:
Zdroj: IEEE/ACM Trans Comput Biol Bioinform
ISSN: 2374-0043
1545-5963
DOI: 10.1109/tcbb.2019.2913368
Popis: The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names.
Databáze: OpenAIRE