Popis: |
Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features. |