Event Mining Through Clustering

Autor: Umamaheswari E., Geetha T.V.
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: Journal of Intelligent Systems, Vol 23, Iss 1, Pp 59-73 (2014)
Druh dokumentu: article
ISSN: 0334-1860
2191-026X
DOI: 10.1515/jisys-2013-0025
Popis: Traditional document clustering algorithms consider text-based features such as unique word count, concept count, etc. to cluster documents. Meanwhile, event mining is the extraction of specific events, their related sub-events, and the associated semantic relations from documents. This work discusses an approach to event mining through clustering. The Universal Networking Language (UNL)-based subgraph, a semantic representation of the document, is used as the input for clustering. Our research focuses on exploring the use of three different feature sets for event clustering and comparing the approaches used for specific event mining. In our previous work, the clustering algorithm used UNL-based event semantics to represent event context for clustering. However, this approach resulted in different events with similar semantics being clustered together. Hence, instead of considering only UNL event semantics, we considered assigning additional weights to similarity between event contexts with event-related attributes such as time, place, and persons. Although we get specific events in a single cluster, sub-events related to the specific events are not necessarily in a single cluster. Therefore, to improve our cluster efficiency, connective terms between two sentences and their representation as UNL subgraphs were also considered for similarity determination. By combining UNL semantics, event-specific arguments similarity, and connective term concepts between sentences, we were able to obtain clusters for specific events and their sub-events. We have used 112 000 Tamil documents from the Forum for Information Retrieval Evaluation data corpus and achieved good results. We have also compared our approach with the previous state-of-the-art approach for Router-RCV1 corpus and achieved 30% improvements in precision.
Databáze: Directory of Open Access Journals