An appraisal of publication embedding techniques in the context of conventional bibliometric relatedness measures

Autor: Lamers, W.S., Eck, N.J.P. van, Colavizza, G.
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Proceedings of the 18th international conference of the international society for scientometrics and informetrics, 633-638
STARTPAGE=633;ENDPAGE=638;TITLE=Proceedings of the 18th international conference of the international society for scientometrics and informetrics
Popis: Modern natural language processing techniques have given rise to embedding techniques that can represent documents based on their content or context, and several papers have operationalized these to perform bibliometric tasks. The relationship between these embeddings and conventional citation based or title and abstract based mappings remains unclear. Contrary to citation-based or term-based relatedness, embedding-based relatedness is not immediately interpretable. We consider four embedding-derived publication relatedness measures, based on: 1) word2vec embeddings of citation labels, sentence embeddings using 2) BERT and 3) SciBERT, and 4) title and abstract embeddings using SPECTER, and compare them with conventional bibliometric publication relatedness measures derived from citation relations and title and abstract noun phrases. We show that there is stronger overlap between these embedding-derived relatedness measures and citation-based relatedness than with title and abstract noun phrase-based relatedness, and that embedding-derived relatedness measures outperform conventional techniques when used to cluster publications cited with the same citation intent.
Databáze: OpenAIRE