Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens
Autor: | Midusha Shrestha, Truong X. Tran, Ramazan S. Aygun, Bidhan Bhattarai, Marc L. Pusey |
---|---|
Rok vydání: | 2020 |
Předmět: |
Matching (statistics)
Information retrieval Computer science Applied Mathematics 0206 medical engineering Computational Biology Proteins 02 engineering and technology String searching algorithm computer.software_genre External Data Representation Data structure Schema matching Article Terminology as Topic Genetics Data Mining Crystallization Databases Protein computer 020602 bioinformatics Biotechnology Data integration |
Zdroj: | IEEE/ACM Trans Comput Biol Bioinform |
ISSN: | 2374-0043 1545-5963 |
DOI: | 10.1109/tcbb.2019.2913368 |
Popis: | The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names. |
Databáze: | OpenAIRE |
Externí odkaz: |