A De-Compositional Approach to Regular Expression Matching for Network Security
Autor: | Eric Norige, Alex X. Liu |
---|---|
Rok vydání: | 2019 |
Předmět: |
Matching (statistics)
Finite-state machine Computer Networks and Communications Computer science String (computer science) 020206 networking & telecommunications 02 engineering and technology Computer Science Applications Automaton Set (abstract data type) 0202 electrical engineering electronic engineering information engineering Regular expression Pattern matching Electrical and Electronic Engineering Algorithm Software |
Zdroj: | IEEE/ACM Transactions on Networking. 27:2179-2191 |
ISSN: | 1558-2566 1063-6692 |
DOI: | 10.1109/tnet.2019.2941920 |
Popis: | Regular Expression (RegEx) matching is the industry standard for Deep Packet Inspection (DPI) because RegExes are significantly more expressive than strings. To achieve high matching speed, we need to convert the RegExes to Deterministic Finite State Automata (DFA). However, DFA has the state explosion problem, that is, the number of DFA states and transitions can be exponential with the number of RegExes. Much work has addressed the DFA state explosion problem; however, none has met all the requirements of fast and automated construction, small memory image, and high matching speed. In this paper, we propose a decompositional approach, with fast and automated construction, small memory image, and high matching speed, to DFA state explosion. The first key idea is to decompose a complex RegEx that cause exponential state increases into a set of simpler RegExes that do not cause exponential state increases, where any character string that matches the complex RegEx also matches all the RegExes in the set of simpler RegExes; that is, the set of strings that match the complex RegEx is a subset of strings that match the set of simpler RegExes. The second key idea is to use a stateful post-processing engine to filter the matches that are actually the matches of the complex RegEx. Given an input string for matching, instead of using the large DFA constructed from the original complex RegEx to perform the matching, we first use the small DFA constructed from the set of simpler RegExes to perform the matching, and then, if the small DFA reports a match, we use the post-processing engine to determine whether it is a true match to the original complex RegEx. Because the pre-processing is simple, automaton construction can be automated and fast, and because most on-line processing is done by a DFA, its matching speed is close to that of a DFA alone. Our experimental results show that our decompositional approach achieves orders of magnitude faster DFA construction (in terms of seconds instead of minutes), 30 times smaller memory image, and 43% faster matching speeds, than state-of-the-art software based RegEx matching algorithms. |
Databáze: | OpenAIRE |
Externí odkaz: |