Citation prediction using time series approach KDD Cup 2003 (task 1)

Autor: Manjunatha, J. N., Sivaramakrishnan, K. R., Pandey, Raghavendra Kumar, Murthy, M Narasimha
Zdroj: ACM SIGKDD Explorations Newsletter; December 2003, Vol. 5 Issue: 2 p152-153, 2p
Abstrakt: In this article we describe our experiences in building the winning system for KDD Cup, 2003, Task 1. This year's competition was based on a very large archive of research papers that provides an unusually comprehensive snapshot of a particular social network in action; in addition to the full text of research papers, it includes both explicit citation structure and partial data on the downloading of papers by users. It provides a framework for testing general network and usage mining techniques, which can be explored via four varied and interesting tasks. Each task is a separate competition with its own specific goal. In task 1 the goal is to predict the change in number of citations to each paper in the archive over time.The contest was very challenging because the given data was not in a format suitable for conventional data mining techniques. So we had to do a considerable amount of data processing. Also there were different sources of data like tex files, citation graph, slac-data database. So we had to make a decision about which sources to use and how much to use.
Databáze: Supplemental Index