Experimental evaluation of train and test split strategies in link prediction

Autor:	Bruin, G.J. de, Veenman, C.J., Herik, H.J. van den, Takes, F.W., Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M.
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Performance Estimation Machine Learning Link Prediction
Zdroj:	Complex networks & their applications IX, 79-91. Cham: Springer STARTPAGE=79;ENDPAGE=91;TITLE=Complex networks & their applications IX
Popis:	In link prediction, the goal is to predict which links will appear in the future of an evolving network. To estimate the performance of these models in a supervised machine learning model, disjoint and independent train and test sets are needed. However, objects in a real-world network are inherently related to each other. Therefore, it is far from trivial to separate candidate links into these disjoint sets.Here we characterize and empirically investigate the two dominant approaches from the literature for creating separate train and test sets in link prediction, referred to as random and temporal splits. Comparing the performance of these two approaches on several large temporal network datasets, we find evidence that random splits may result in too optimistic results, whereas a temporal split may give a more fair and realistic indication of performance. Results appear robust to the selection of temporal intervals. These findings will be of interest to researchers that employ link prediction or other machine learning tasks in networks.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::04f0b03566ab4b36cba65a675a879c66 https://hdl.handle.net/1887/3243035 Zobrazit plný text záznamu