An N-ary Tree-based Model for Similarity Evaluation on Mathematical Formulae
Autor: | Liangyu Chen, Yifan Dai, Zihan Zhang |
---|---|
Rok vydání: | 2020 |
Předmět: |
K-ary tree
Word embedding Computer science 02 engineering and technology Function (mathematics) 010501 environmental sciences 01 natural sciences Tree (data structure) Search engine Tree structure Similarity (network science) 0202 electrical engineering electronic engineering information engineering Embedding 020201 artificial intelligence & image processing Algorithm 0105 earth and related environmental sciences |
Zdroj: | SMC |
DOI: | 10.1109/smc42975.2020.9283495 |
Popis: | Accurate and efficient measurements for evaluating the similarity between mathematical formulae play an important role in mathematical information retrieval. Most previous studies have focused on representing formulae in different types to catch their features and combining the traditional structure matching algorithms. This paper presents a new unsupervised model called N-ary Tree-based Formula Embedding Model (NTFEM) for the task of mathematical similarity evaluation. Using an n-ary tree structure to represent the formula, we convert the formula into a linear sequence that can be viewed as the input sentence and then embed the formula by using a word embedding model. Based on the characteristics of mathematical formulae, a weighting function is also used to get the final weighted average embedding vector. Through some experiments on NTCIR-12 Wikipedia Formula Browsing Task, our model can outperform previous formula search engines in Bpref prediction metrics. In addition, compared with traditional tree-based models, NTFEM not only improves the retrieval effect, but also greatly reduces the training time and improves training efficiency. |
Databáze: | OpenAIRE |
Externí odkaz: |