Embedding and generalization of formula with context in the retrieval of mathematical information

Autor: Partha Pakray, Pankaj Dadure, Sivaji Bandyopadhyay
Rok vydání: 2022
Předmět:
Zdroj: Journal of King Saud University - Computer and Information Sciences. 34:6624-6634
ISSN: 1319-1578
DOI: 10.1016/j.jksuci.2021.05.014
Popis: Retrieval of mathematical information from scientific documents is one of the crucial tasks. Numerous Mathematical Information Retrieval (MIR) systems have been developed, which mainly focus on the improvement over the indexing and the searching mechanism, the poor results obtained for evaluation measures depict major limitations of such systems. These enhance the scope of improvement and new innovations through the inclusion of functionalities, which can resolve the challenges of MIR system. Further, to improve the performance of the MIR systems, this paper proposed a formula embedding and generalization approach with the context, in addition to innovative relevance measurement technique. In this approach, documents are preprocessed by the document preprocessor module and extracted the formulas in Presentation MathML format with their context. The formula embedding and generalization modules of the proposed approach formed the binary vectors where ‘1’ represents the presence, and ‘0’ represents the absence of a particular entity in a formula, and subsequently, the vectors of formulas with context are indexed by the indexer. The innovative relevance measurement technique of the proposed approach ranked those documents first, which are retrieved by both formula embedding and generalization modules as compared to the individual one. The proposed approach has been tested on the MathTagArticles of Wikipedia of NTCIR-12, and the obtained results verify the significance of the context of the formula and the dissimilarity factor in the retrieval of mathematical information.
Databáze: OpenAIRE