Identification of Literary Movements Using Complex Networks to Represent Texts

Autor: Amancio, Diego R., Oliveira Jr., Osvaldo N., Costa, Luciano da F.
Rok vydání: 2013
Předmět:
Zdroj: New J. Phys. 14 043029 (2012)
Druh dokumentu: Working Paper
DOI: 10.1088/1367-2630/14/4/043029
Popis: The use of statistical methods to analyze large databases of text has been useful to unveil patterns of human behavior and establish historical links between cultures and languages. In this study, we identify literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last 5 centuries. The most important factor contributing to the distinction between different literary styles was {the average shortest path length (particularly, the asymmetry of the distribution)}. Furthermore, over time there has been a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures.
Comment: The Supplementary Information (SI) is available from http://iopscience.iop.org/1367-2630/14/4/043029/media/njp043029suppdata.pdf
Databáze: arXiv