What does that gene do? Gene function prediction by machine learning with applications to plants

Autor: Makrodimitris, S.
Přispěvatelé: Reinders, M.J.T., van Ham, R.C.H.J., Delft University of Technology
Rok vydání: 2021
Předmět:
DOI: 10.4233/uuid:6744b23a-aec6-4852-837a-6ffd466ca24d
Popis: Billions of people world-wide rely on plant-based food for their daily energy intake. As global warming and the spread of diseases (such as the banana Panama disease) is substantially hindering the cultivation of plants, the need to develop temperature- and/or disease-resistant varieties is getting more and more pressing. The field of plant breeding has been revolutionized by the use of molecular biology methods, such as DNA and RNA sequencing, which substantially accelerated the finding of genes that are likely to influence a trait of interest. The outcome of such experiments is typically a long list of candidate genes whose involvement in the trait needs to be experimentally validated. Prioritizing these experiments, i.e. testing the most promising genes first, can save a lot of time, effort and money, but is often hindered by the fact that the cellular roles (functions) of plant genes and the corresponding proteins is often unknown. Experimentally discovering the functions of genes is equally time-consuming and costly, so it is crucial to have computer algorithms that can automatically predict gene or protein functionswith high accuracy. After decades of research on this field, considerable progress has been made, but we are still far from a widely-acceptable and accurate solution to the problem.This thesis explores different research directions to improve protein function prediction, by developing new machine learning algorithms. These directions include new ways to represent proteins, exploiting semantic relationships among functions, and function-specific feature selection. The thesis also deals with the problem of missing protein interaction data for non-model species and quantifies its effect on protein function prediction. All in all, it provides novel insights to the problem that future work can build upon to lead to new breakthroughs.
Databáze: OpenAIRE