Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Autor:	Sofia Ares Oliveira, Frédéric Kaplan, Simon Clematide, Maud Ehrmann, Raphaël Barman
Přispěvatelé:	University of Zurich
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	multimodal learning FOS: Computer and information sciences Computer science Computer Vision and Pattern Recognition (cs.CV) Context (language use) 410 Linguistics 02 engineering and technology historical newspapers 000 Computer science knowledge & systems computer science - information retrieval computer.software_genre Newspaper Machine Learning (cs.LG) Machine Learning lcsh:AZ20-999 0202 electrical engineering electronic engineering information engineering Segmentation Computation and Language image segmentation computer science - computation and language business.industry Deep learning computer science - machine learning deep learning 020207 software engineering Image segmentation lcsh:History of scholarship and learning. The humanities lcsh:Z lcsh:Bibliography. Library science. Information resources Multimodal learning Categorization computer science - computer vision and pattern recognition 10105 Institute of Computational Linguistics Computer Science Information Retrieval 020201 artificial intelligence & image processing Artificial intelligence Computer Vision and Pattern Recognition business computer Computation and Language (cs.CL) Document layout analysis Natural language processing digital humanitites Information Retrieval (cs.IR)
Zdroj:	Journal of Data Mining and Digital Humanities, Vol HistoInformatics, Iss HistoInformatics (2021)
ISSN:	2416-5999
Popis:	The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. Although the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of more fine-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. We introduce a multimodal neural model for the semantic segmentation of historical newspapers that directly combines visual features at pixel level with text embedding maps derived from, potentially noisy, OCR output. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to the wide variety of our material.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::69917e31786ff94f726ce42a56df41e0 https://jdmdh.episciences.org/7097/pdf Zobrazit plný text záznamu