Summarization of Films and Documentaries Based on Subtitles and Scripts

Autor: Marta Aparício, Paulo Figueiredo, Ricardo Ribeiro, Luís Marujo, David Martins de Matos, Francisco Raposo
Rok vydání: 2015
Předmět:
FOS: Computer and information sciences
Computer Science - Artificial Intelligence
Computer science
Automatic text summarization
02 engineering and technology
computer.software_genre
Plot (graphics)
Summarization of films
GeneralLiterature_MISCELLANEOUS
Computer Science - Information Retrieval
030507 speech-language pathology & audiology
03 medical and health sciences
Artificial Intelligence
0202 electrical engineering
electronic engineering
information engineering

Computer Science - Computation and Language
Information retrieval
I.2.7
Ciências Naturais::Ciências da Computação e da Informação [Domínio/Área Científica]
Summarization of documentaries
Automatic summarization
Artificial Intelligence (cs.AI)
Ranking
Generic summarization
Scripting language
Signal Processing
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
020201 artificial intelligence & image processing
Computer Vision and Pattern Recognition
0305 other medical science
computer
Computation and Language (cs.CL)
Software
Information Retrieval (cs.IR)
Zdroj: Repositório Científico de Acesso Aberto de Portugal
Repositório Científico de Acesso Aberto de Portugal (RCAAP)
instacron:RCAAP
DOI: 10.48550/arxiv.1506.01273
Popis: We assess the performance of generic text summarization algorithms applied to films and documentaries, using the well-known behavior of summarization of news articles as reference. We use three datasets: (i) news articles, (ii) film scripts and subtitles, and (iii) documentary subtitles. Standard ROUGE metrics are used for comparing generated summaries against news abstracts, plot summaries, and synopses. We show that the best performing algorithms are LSA, for news articles and documentaries, and LexRank and Support Sets, for films. Despite the different nature of films and documentaries, their relative behavior is in accordance with that obtained for news articles.
Comment: 7 pages, 9 tables, 4 figures, submitted to Pattern Recognition Letters (Elsevier)
Databáze: OpenAIRE