Terminology Extraction from Log Files
Autor: | Hassan Saneifar, Anne Laurent, Mathieu Roche, Stéphane Bonniol, Pascal Poncelet |
---|---|
Přispěvatelé: | Roche, Mathieu, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Satin IP Technologies, Université Montpellier 2 - Sciences et Techniques (UM2), Fouille de données environnementales (TATOO), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Exploration et exploitation de données textuelles (TEXTE), Satin-IP (Satin-IP) |
Rok vydání: | 2009 |
Předmět: |
[SPI.OTHER]Engineering Sciences [physics]/Other
Vocabulary Information extraction Computer science media_common.quotation_subject [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing Context (language use) 02 engineering and technology computer.software_genre Terminology 020204 information systems Terminology extraction 0202 electrical engineering electronic engineering information engineering media_common Log files [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] Information retrieval [SPI.OTHER] Engineering Sciences [physics]/Other business.industry Natural language processing [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing Management information systems [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] 020201 artificial intelligence & image processing Web log analysis software [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR] Artificial intelligence business computer Natural language |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783642035722 DEXA 20th International Conference on Database and Expert Systems Applications DEXA: Database and Expert Systems Applications DEXA: Database and Expert Systems Applications, Aug 2009, Linz, Austria. pp.769-776, ⟨10.1007/978-3-642-03573-9_65⟩ HAL RR-09010, 2009, pp.16 |
DOI: | 10.1007/978-3-642-03573-9_65 |
Popis: | In many domains, the log files generated by digital systems contain important information on the conditions and configurations of systems. Information Extraction from these log files is an essential phase in information systems, which manage the production line. In the case of Integrated Circuit designs, log files generated by design tools are not exhaustively exploited. Although these log files are written in English, they usually do not respect the grammar and the structures of natural language. Moreover, such logs have a heterogeneous and evolving structure. According to features of such textual data, applying the classical methods of information extraction is not an easy task, more particularly for terminology extraction. In this paper, we thus introduce our approach Exterlog to extract the terminology from such log files. We also aim at knowing if POS tagging of such log files is a relevant approach for terminology extraction. |
Databáze: | OpenAIRE |
Externí odkaz: |