Demo Paper: Ad Hoc Search On Statistical Data Based On Categorization And Metadata Augmentation

Autor: Hisashi Miyamori, Taku Okamoto
Rok vydání: 2021
Předmět:
Zdroj: MIPR
Popis: In this paper, we describe the system of ad hoc search on statistical data based on categorization and metadata augmentation. The documents covered by this paper consist of metadata extracted from the governmental statistical data and the body of the corresponding statistical data. The metadata is characterized by the fact that its document length is short, and the main body of statistical data is almost always composed of numbers, except for titles, headers, and comments. We newly developed the categorical search that narrows the set of documents to be retrieved by category in order to properly capture the scope of the problem domain intended by the given query. In addition, to compensate for the short document length of metadata, we implemented a method of extracting the header information of the table from the main body of statistical data to augment documents to be searched. As a ranking model, we adopted BM25, which can be adjusted with few parameters to take into account term frequency and document length.
Databáze: OpenAIRE