Irregularity Detection in Categorized Document Corpora

Autor: Sluban, B., Senja Pollak, Coesemans, R., Lavrac, N.
Přispěvatelé: Brussels Institute for Journalism Studies, Applied Linguistics
Jazyk: angličtina
Rok vydání: 2012
Předmět:
Zdroj: Scopus-Elsevier
Popis: The paper presents an approach to extract irregularities in document corpora, where the documents originate from different sources and the analyst's interest is to find documents which are a typical for the given source. The main contribution of the paper is a voting-based approach to irregularity detection and its evaluation on a collection of newspaper articles from two sources: Western (UK and US) and local (Kenyan) media. The evaluation of a domain expert proves that the method is very effective in uncovering interesting irregularities in categorized document corpora.
Databáze: OpenAIRE