Deployment of Batch Processing for Log File Analysis

Autor: Timo Hämäläinen, Esa Heikkinen
Rok vydání: 2020
Předmět:
Zdroj: ICPS
DOI: 10.1109/icps48405.2020.9274712
Popis: We have used log file analysis in mining expected behavior in intelligent transportation systems involving spatial and temporal data. The challenge is how to extract complex behavior from multiple traces, in which linear log analysis proceeding in a row by row order does not suffice. Complex Event Processing (CEP) is close to our need, but it is surprisingly difficult to set up and deploy general purpose frameworks to the purpose. This paper originates from the need to compare our custom LOGDIG tool to Apache Flink. This paper focuses on the deployment effort of the two, for which reason we consider setting up the development and run-time environments, selecting the proper analysis approach and evaluating the difficulty in five different aspects. While LOGDIG is written solely in Python, Flink is a combination of many languages, libraries, packages and tools. Our comparison includes Flink in batch and stream processing modes using external and internal preprocessing. We lend the Degree of Difficulty (DoD) measure from sports to assess the deployment effort. Flink needs significant setup effort for deploying the same functionality as LOGDIG. The former is continuously developing while LOGDIG is more focused and stable and can be used more easily off-the-self.
Databáze: OpenAIRE