On Bi-gram Graph Attributes

Autor: Konstantinovsky, Thomas, Mizrachi, Matan
Rok vydání: 2021
Předmět:
Druh dokumentu: Working Paper
DOI: 10.5539/cis.v14n3p78
Popis: We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a "bi-gram graph" representation of a corpus. The different attributes derived from graph theory are measured and analyzed as unique insights or against other corpus graphs. We observe a vast domain of tools and algorithms that can be developed on top of the graph representation; creating such a graph proves to be computationally cheap, and much of the heavy lifting is achieved via basic graph calculations. Furthermore, we showcase the different use-cases for the bi-gram graphs and how scalable it proves to be when dealing with large datasets.
Comment: 7 pages,8 figures
Databáze: arXiv