Autor: |
Gervers, Michael1, Tilahun, Gelila2, Khoshraftar, Shima3, Mitchell, Roderick A.4 |
Předmět: |
|
Zdroj: |
Archives (00039535). 2018, Vol. 53 Issue 137, p1-33. 33p. |
Abstrakt: |
Approximately 95% of all English charters from the Conquest in 1066 to the beginning of the reign of Edward II in 1307 were issued without dates. One of the major objectives of the DEEDS Project (DEEDS, an acronym for Documents of Early England Data Set) at the University of Toronto has been to estimate dates of these undated documents through automation. This paper describes a World Wide Web user-interface toolkit to date the undated English charter, as well as the underlying two computationally intensive dating methodologies - the Maximum Prevalence and a distance based method. The Maximum Prevalence method, the more accurate of the two, relies on analyzing changes in the pattern of word and phrase usage as derived from a carefully selected collection containing thousands of dated documents electronically transcribed and stored in the DEEDS corpus. Over and above the dating of documents, the toolkit, which has features to visualize this pattern of change, is useful to historians, archivists and linguists alike. The distancebased method relies on computing the weighted sums of the dates of the documents in the DEEDS collection. The weights are determined on the basis of similarity between an undated document and the dated collection - the higher the similarity, the higher the weight; the reverse holds when the similarity is low. The performance of each of the dating methods is presented on a test set, where the average absolute errors for the Maximum Prevalence and the distance-based methods are found to be 7.6 and 12.5 years, respectively. A 'leave-one-out' cross-validation experiment performed on the more than 12,000 documents in the test set confirms the accuracy of the methodology. The strengths and weaknesses of each of the dating methods are discussed. In addition, a full description of the DEEDS corpus from England and continental Europe is provided, including the kinds of metadata that have been compiled from it. [ABSTRACT FROM AUTHOR] |
Databáze: |
Library, Information Science & Technology Abstracts |
Externí odkaz: |
|