Real-time regex matching with apache spark
Autor: | Zhaozhong Zhu, Sean Deaton, Suzanne J. Matthews, Leonard Kosta, David Brownfield |
---|---|
Rok vydání: | 2017 |
Předmět: |
Matching (statistics)
021103 operations research Database business.industry Computer science Big data 0211 other engineering and technologies Parallel algorithm 020207 software engineering 02 engineering and technology Network monitoring computer.software_genre Spark (mathematics) 0202 electrical engineering electronic engineering information engineering Operating system Anomaly detection Regular expression Pattern matching business computer |
Zdroj: | HPEC |
DOI: | 10.1109/hpec.2017.8091063 |
Popis: | Network Monitoring Systems (NMS) are an important part of protecting Army and enterprise networks. As governments and corporations grow, the amount of traffic data collected by NMS grows proportionally. To protect users against emerging threats, it is common practice for organizations to maintain a series of custom regular expression (regex) patterns to run on NMS data. However, the growth of network traffic makes it increasingly difficult for network administrators to perform this process quickly. In this paper, we describe a novel algorithm that leverages Apache Spark to perform regex matching in parallel. We test our approach on a dataset of 31 million Bro HTTP log events and 569 regular expressions provided by the Army Engineer Research & Development Center (ERDC). Our results indicate that we are able to process 1, 250 events in 1.047 seconds, meeting the desired definition of real-time. |
Databáze: | OpenAIRE |
Externí odkaz: |