BUAS: Joint Bottom-Up Article Selection for Quick Article Similarity Identification Based on NLP.

Autor: Syu-Jhih Jhang, Chih-Yung Chang, Shih-Jung Wu, Chia-Ling Ho
Předmět:
Zdroj: International Journal of Design, Analysis & Tools for Integrated Circuits & Systems; Dec2022, Vol. 11 Issue 2, p33-36, 4p
Abstrakt: Article Similarity Identification is one of the most issue in Article Comparison. In the literature, some studies proposed the similarity comparison mechanisms based on Word2Vec, N-gram or Bert. However, a document usually contains a large number of words. Let the source of the comparison be a document. The goal of the comparison is to compare the source document with thousands of documents in the database. It was time-consuming to compare the similarity of one target document and all documents in the database, since the existing mechanisms only can compare the similarity of two documents. As a result, the plagiarism comparison is very time consuming. This paper proposes a plagiarism comparison mechanism, called BUAS, which speeds up the similarity comparison since the Bag of word scheme is initially applied to transform each document as a document vector. Then the most similar document can be found as the candidate document. As a result, the target document only needs to be compared with the candidate document. Performance studies confirm that the similarity calculation by BUAS outperforms existing studies in terms of precision, recall and F1 score. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index