Prediction of Selection Decision of Document Using Bibliographic Data at the National Library of France (BnF)
Autor: | Salah, A. B., Cron, G., Nicolas RAGOT, Paquet, T. |
---|---|
Přispěvatelé: | Equipe Apprentissage (DocApp - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), Bibliothèque nationale de France, Délégation à la Stratégie et à la recherche (BnF_DSG), Bibliothèque Nationale de France, Bibliothèque nationale de France (BnF), Laboratoire d'Informatique Fondamentale et Appliquée de Tours (LIFAT), Université de Tours (UT)-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Society for Imaging Science and Technology, Plan Triennal de recherche, Université de Tours-Institut National des Sciences Appliquées - Centre Val de Loire (INSA CVL), Ben Salah, Ahmed |
Jazyk: | angličtina |
Rok vydání: | 2012 |
Předmět: |
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI]
Correspondence Factor Analysis Optical Character Recognition [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] General Engineering Data analysis Multiple Correspondence Analysis [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] |
Zdroj: | Archiving Archiving, Jun 2012, Copenhague, Denmark. pp.135-140 Scopus-Elsevier |
Popis: | p. 135-140; International audience; The selection process of the documents is a very important step in mass digitization projects. This is especially true at the BnF, where the digitization should include or not OCRization depending on the OCR results expected. Consequently, the selection task is very complex and time consuming due to the number of documents to be processed and the diversity of the selection criteria to consider. Trying to improve and simplify this task by automation, we studied the relationship between bibliographic data and the selection decisions of documents. We used two statistical analysis : a factor analysis of correspondence and a multiple correspondence analysis. Our analysis has shown that, for example, the documents in format "4 or GR FOL" and edited "between 1961 and 1990" in Morocco are more likely to be "Selected". However, the documents in format "16 or 8" and edited "between 1871 and 1800 in English or Spanish have a greater chance to be "Not Selected". |
Databáze: | OpenAIRE |
Externí odkaz: |