Popis: |
Suicide has become a major public health and social concern in the world. Suicidal ideation cause extraction (SICE) in social texts can provide support for suicide prevention. This article summarizes the research on suicidal ideation causes (SICs) through the use of psychological and sociological analysis. Then, a social text-based SIC dataset is constructed and analyzed statistically for various features. A CRF model is provided along with Char-BiLSTM-CRF, which uses concatenation of word embeddings and character embeddings as word representation inputs. Then, the effects on the task are explored by the word (W), part of speech (POS), dependence relationship (DP), suicidal psychology (PCS), emotion (ET), and language (LG) features in the CRF model. The experiment shows that the word features worked best. POS and DP can be somehow covered by word features. PCS, ET and LG features can improve the effect of SICE. It also shows that Char-BiLSTM-CRF is better than CRF in general, but CRF still has advantages in terms of precision. Adding character embeddings and CRF layers can significantly improve the extraction using Char-BiLSTM-CRF. The experiment also compared three-word embeddings with Word2vec, ELMo and BERT. Compared with Word2vec, the F-value of ELMo is increased by approximately 5% on average, and compared with ELMo, the C_F and E_F of BERT are increased by 3.5% and 2.3%, respectively. Finally, the challenge of SICE is discussed based on the experimental results of Char-BiLSTM-CRF. |