Exploring interview collections with the help of named entity linking and topic classification

Autor: Egyed-Gergely, Júlia, Gárdos, Judit, Horváth, Anna, Meiszterics, Enikő, Vajda, Róza, Kovács, László, Micsik, András, Pataki, Balázs
Přispěvatelé: Martin, Dániel, Marx, Attila, Siket, Melinda, Tóth, Zoltán
Jazyk: angličtina
Rok vydání: 2023
Předmět:
DOI: 10.5281/zenodo.8013354
Popis: In this paper we present an approach to support the processing of long in-depth social scientific interviews and to enable these texts for secondary research. However, the problem area and the implemented solution are generalizable to many other tasks where long texts have to be explored without reading them thoroughly, and relevant text parts have to be found based on complex criteria. In our approach, the processing of texts starts with digitization (if needed, in case of analogue, heritage interviews) and then they are cut into manageable sized blocks such as pages, which will serve as units for analysis and reuse in other research. These text blocks get automatically assigned keywords, which are consequently mapped to topics using NLP tools. The topic thesaurus specifically created for this task is a hierarchical structure of terms based on the ELSST (European Language Social Science Thesaurus by CESSDA), which contains all major sociological research areas and fields of inquiry. Furthermore, named entities are extracted from texts and, where possible, linked to Wikidata (wikification). Through Wikidata we also collect links to other important registries such as GeoNames, ISNI or VIAF. Based on these preprocessing steps, an exploratory user interface was built to facilitate searching for pages/blocks in the interview corpus related to the given topics. After selecting a page/block, the researcher sees the text with named entities, keywords and topics highlighted. This will help her decide whether to use the text in her research. We have also experimented with different methods of mapping the contents of the archives we are working with. We have built various diagrams to characterize interviews or interview collections, and implemented an interface where researchers can prepare these diagrams themselves, for their own use, without any programming. DARIAH2023 Submission 180
Databáze: OpenAIRE