Terminology Extraction from Log Files

Autor: Hassan Saneifar, Anne Laurent, Mathieu Roche, Stéphane Bonniol, Pascal Poncelet
Přispěvatelé: Roche, Mathieu, Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Satin IP Technologies, Université Montpellier 2 - Sciences et Techniques (UM2), Fouille de données environnementales (TATOO), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Exploration et exploitation de données textuelles (TEXTE), Satin-IP (Satin-IP)
Rok vydání: 2009
Předmět:
[SPI.OTHER]Engineering Sciences [physics]/Other
Vocabulary
Information extraction
Computer science
media_common.quotation_subject
[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing
Context (language use)
02 engineering and technology
computer.software_genre
Terminology
020204 information systems
Terminology extraction
0202 electrical engineering
electronic engineering
information engineering

media_common
Log files
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Information retrieval
[SPI.OTHER] Engineering Sciences [physics]/Other
business.industry
Natural language processing
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Management information systems
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
020201 artificial intelligence & image processing
Web log analysis software
[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR]
Artificial intelligence
business
computer
Natural language
Zdroj: Lecture Notes in Computer Science ISBN: 9783642035722
DEXA
20th International Conference on Database and Expert Systems Applications
DEXA: Database and Expert Systems Applications
DEXA: Database and Expert Systems Applications, Aug 2009, Linz, Austria. pp.769-776, ⟨10.1007/978-3-642-03573-9_65⟩
HAL
RR-09010, 2009, pp.16
DOI: 10.1007/978-3-642-03573-9_65
Popis: In many domains, the log files generated by digital systems contain important information on the conditions and configurations of systems. Information Extraction from these log files is an essential phase in information systems, which manage the production line. In the case of Integrated Circuit designs, log files generated by design tools are not exhaustively exploited. Although these log files are written in English, they usually do not respect the grammar and the structures of natural language. Moreover, such logs have a heterogeneous and evolving structure. According to features of such textual data, applying the classical methods of information extraction is not an easy task, more particularly for terminology extraction. In this paper, we thus introduce our approach Exterlog to extract the terminology from such log files. We also aim at knowing if POS tagging of such log files is a relevant approach for terminology extraction.
Databáze: OpenAIRE