Popis: |
Reverse engineering of unknown network protocols based on recorded traffic traces enables security analyses and debugging of undocumented network services. In particular for binary protocols, existing approaches (1) lack comprehensive methods to classify or determine the data type of a discovered segment in a message, e.,g., a number, timestamp, or network address, that would allow for a semantic interpretation and (2) have strong assumptions that prevent analysis of lower-layer protocols often found in IoT or mobile systems. In this paper, we propose the first generic method for analyzing unknown messages from binary protocols to reveal the data types in message fields. To this end, we split messages into segments of bytes and use their vector interpretation to calculate similarities. These can be used to create clusters of segments with the same type and, moreover, to recognize specific data types based on the clusters' characteristics. Our extensive evaluation shows that our method provides precise classification in most cases and a data-type-recognition precision of up to 100% at reasonable recall, improving the state-of-the-art by a factor between 1.3 and 3.7 in realistic scenarios. We open-source our implementation to facilitate follow-up works. |