Automatic Normalization of Temporal Expressions

Autor:	Ceri Binding, Douglas Tudhope
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	temporal expressions dating time periods semantic integration software multilingual Archaeology CC1-960 Electronic computers. Computer science QA75.5-76.95
Zdroj:	Journal of Computer Applications in Archaeology, Vol 6, Iss 1 (2023)
Druh dokumentu:	article
ISSN:	2514-8362
DOI:	10.5334/jcaa.105
Popis:	Dates, periods and timespans are described in archaeological datasets using a number of different textual patterns for which myriad variations exist, rendering direct automated comparison difficult. The issue can occur even within records from the same dataset and is further compounded when attempting to integrate multilingual data – particularly where dates may be expressed in words rather than numbers. The same problem can be found in temporal metadata, whether manually entered or generated via Natural Language Processing (NLP) techniques from reports and grey literature. Resolving and normalizing dates and periods to internationally agreed standard formats enables efficient data integration, interchange, search, comparison and visualization. This paper reports on the design and implementation of a tool to normalize temporal expressions to a numerical time axis and reflects on key issues. Textual patterns for seven categories of temporal expression have been normalized: Ordinal named or numbered centuries; Year spans; Single year (with tolerance); Decades; Century spans; Single year with prefix; Named periods. The following languages are currently supported: Dutch, English, French, German, Italian, Norwegian, Spanish, Swedish, Welsh. Methods are described together with an (open source) normalization tool developed in Python and four applications of the method are discussed, together with limitations and future work. Results are presented from diverse data sets and languages. The input is a temporal text string and a language code (ISO639-1). The output is a tab delimited text file with start/end years (in ISO 8601 format), relative to Common Era (CE). The normalized outputs are provided as additional attributes along with the original text expression for consuming software to employ in end-user applications.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/36232ef6f1e845abb785d8a4653914c0 Zobrazit plný text záznamu View record in DOAJ