An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae

Autor: Liangyu Chen, Yifan Dai, Zihan Zhang
Rok vydání: 2020
Předmět:
Zdroj: SMC
DOI: 10.1109/smc42975.2020.9283495
Popis: Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency.
Databáze: OpenAIRE