Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity

Autor: Jana Shakarian, Kazuaki Kashihara, Chitta Baral
Rok vydání: 2020
Předmět:
Zdroj: Advances in Intelligent Systems and Computing ISBN: 9783030551865
IntelliSys (2)
Popis: The automated and timely conversion or extraction of cybersecurity information from unstructured text from online sources is important and required for many applications. Named Entity Recognition (NER) is used to detect the relevant domain entities such as product, attack name, malware name, hacker group name, etc. To train a new NER model for cybersecurity, traditional NER requires a training corpus annotated with cybersecurity entities and state-of-the-art methods require time-consuming and labor intensive feature engineering. We propose a Human-Machine Interaction method for semi-automatic labeling and corpus generation for cybersecurity entities. Our method evaluates the learned NER model with the sentences that we collected in the training process, and the user selects only the correct pair of the named entity and its category for next iteration training. Thus, each iteration gets better training corpora to train the NER model. Some entities are ambiguous since the word or phrase has multiple meanings. We introduce a new semantic similarity measure and determine which category the word belongs to based on this semantic similarity of the entire sentence. The experimental evaluation result shows that our method is better than existing methods in finding undiscovered keywords of given categories.
Databáze: OpenAIRE