Text Mining and Machine Learning Protocol for Extracting Human-Related Protein Phosphorylation Information from PubMed.
Autor: | Arumugam K; Department of Management Studies, Coimbatore Institute of Engineering and Technology, Coimbatore, Tamilnadu, India. kmurthya@gmail.com., Shanker RR; International Business Unit, Alembic Pharmaceuticals Limited, Vadodara, Gujarat, India. |
---|---|
Jazyk: | angličtina |
Zdroj: | Methods in molecular biology (Clifton, N.J.) [Methods Mol Biol] 2022; Vol. 2496, pp. 159-177. |
DOI: | 10.1007/978-1-0716-2305-3_9 |
Abstrakt: | In the modern health care research, protein phosphorylation has gained an enormous attention from the researchers across the globe and requires automated approaches to process a huge volume of data on proteins and their modifications at the cellular level. The data generated at the cellular level is unique as well as arbitrary, and an accumulation of massive volume of information is inevitable. Biological research has revealed that a huge array of cellular communication aided by protein phosphorylation and other similar mechanisms imply different and diverse meanings. This led to a collection of huge volume of data to understand the biological functions of human evolution, especially for combating diseases in a better way. Text mining, an automated approach to mine the information from an unstructured data, finds its application in extracting protein phosphorylation information from the biomedical literature databases such as PubMed. This chapter outlines a recent text mining protocol that applies natural language parsing (NLP) for named entity recognition and text processing, and support vector machines (SVM), a machine learning algorithm for classifying the processed text related human protein phosphorylation. We discuss on evaluating the text mining system which is the outcome of the protocol on three corpora, namely, human Protein Phosphorylation (hPP) corpus, Integrated Protein Literature Information and Knowledge corpus (iProLink), and Phosphorylation Literature corpus (PLC). We also present a basic understanding on the chemistry and biology that drive the protein phosphorylation process in a human body. We believe that this basic understanding will be useful to advance the existing text mining systems for extracting protein phosphorylation information from PubMed. (© 2022. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.) |
Databáze: | MEDLINE |
Externí odkaz: |