Popis: |
The ability to automatically classify files based on their low-level, short-range structures is of particular importance in computer forensics. We report a study on the automatic learning of file classification using byte sub-stream kernels that capture these low-level structures. We automatically discover byte-level patterns in a file by extracting a byte sequence feature map and use a suffix trie data structure to efficiently store and manipulate the feature map. Using the feature map we compute the spectrum kernel and, together with a support vector machine classifier algorithm, we are able to efficiently categorize a variety of different system and application file types. Experiments have provided good file classification performance results. |