Autor: |
Zhang, Yating, Jatowt, Adam, Bhowmick, Sourav S., Tanaka, Katsumi |
Předmět: |
|
Zdroj: |
IEEE Transactions on Knowledge & Data Engineering; Oct2016, Vol. 28 Issue 10, p2793-2807, 15p |
Abstrakt: |
Numerous archives and collections of past documents have become available recently thanks to mass scale digitization and preservation efforts. Libraries, national archives, and other memory institutions have started opening up their collections to interested users. Yet, searching within such collections usually requires knowledge of appropriate keywords due to different context and language of the past. Thus, non-professional users may have difficulties with conceptualizing suitable queries, as, typically, their knowledge of the past is limited. In this paper, we propose a novel approach for the temporal correspondence detection task that requires finding terms in the past which are semantically closest to a given input present term. The approach we propose is based on vector space transformation that maps the distributed word representation in the present to the one in the past. The key problem in this approach is obtaining correct training set that could be used for a variety of diverse document collections and arbitrary time periods. To solve this problem, we propose an effective technique for automatically constructing seed pairs of terms to be used for finding the transformation. We test the performance of proposed approaches over short as well as long time frames such as 100 years. Our experiments demonstrate that the proposed methods outperform the best-performing baseline by 113 percent for the New York Times Annotated Corpus and by 28 percent for the Times Archive in MRR on average, when the query has a different literal form from its temporal counterpart. [ABSTRACT FROM PUBLISHER] |
Databáze: |
Complementary Index |
Externí odkaz: |
|