Popis: |
Studying metabolic networks is vital for many areas such as novel drugs and bio-fuels. For biologists, a key challenge is that many reactions are impractical or expensive to be found through experiments. Our task is to recover the missing reactions. By exploiting the problem structure, we model reaction recovery as a hyperlink prediction problem, where each reaction is regarded as a hyperlink connecting its participating vertices (metabolites). Different from the traditional link prediction problem where two nodes form a link, a hyperlink can involve an arbitrary number of nodes. Since the cardinality of a hyperlink is variable, existing classifiers based on a fixed number of input features become infeasible. Traditional methods, such as common neighbors and Katz index, are not applicable either, since they are restricted to pairwise similarities. In this paper, we propose a novel hyperlink prediction algorithm, called Matrix Boosting (MATBoost). MATBoost conducts inference jointly in the incidence space and adjacency space by performing an iterative completion-matching optimization. We carry out extensive experiments to show that MATBoost achieves state-of-the-art performance. For a metabolic network with 1805 metabolites and 2583 reactions, our algorithm can successfully recover nearly 200 reactions out of 400 missing reactions. |