Experimental evaluation of train and test split strategies in link prediction
Autor: | Bruin, G.J. de, Veenman, C.J., Herik, H.J. van den, Takes, F.W., Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M. |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | Complex networks & their applications IX, 79-91. Cham: Springer STARTPAGE=79;ENDPAGE=91;TITLE=Complex networks & their applications IX |
Popis: | In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks. |
Databáze: | OpenAIRE |
Externí odkaz: |