From Text to Map: Combing Named Entity Recognition and Geographic Information Systems.

Autor: Harper, Charlie (AUTHOR), Gorham, R. Benjamin (AUTHOR)
Předmět:
Zdroj: Code4Lib Journal. 8/10/2020, Issue 49, pN.PAG-N.PAG. 1p.
Abstrakt: This tutorial shows readers how to leverage the power of named entity recognition (NER) and geographic information systems (GIS) to extract place names from text, geocode them, and create a public-facing map. This process is highly useful across disciplines. For example, it can be used to generate maps from historical primary sources, works of literature set in the real world, and corpora of academic scholarship. In order to lead the reader through this process, the authors work with a 500 article sample of the COVID-19 Open Research Dataset Challenge (CORD-19) dataset. As of the date of writing, CORD-19 includes 45,000 full-text articles with metadata. Using this sample, the authors demonstrate how to extract locations from the full-text with the spaCy library in Python, highlight methods to clean up the extracted data with the Pandas library, and finally teach the reader how to create an interactive map of the places using ArcGIS Online. The processes and code are described in a manner that is reusable for any corpus of text [ABSTRACT FROM AUTHOR]
Databáze: Library, Information Science & Technology Abstracts