A supervised contrast learning text classification model based on two-layer data augmentation

Autor: Liang Wu, Fangfang Zhang, Shinan Song, Chao Cheng
Rok vydání: 2023
DOI: 10.21203/rs.3.rs-2744398/v1
Popis: Supervised comparative learning makes the spatial features of similar samples close to each other by category information. However, currently supervised contrast learning relies more on the spatial feature representation of the original data, which lacks enough intra-class variation to obtain good results when training data is lacking. A supervised comparative learning text classification model based on two-layer data augmentation(TDACL) has been presented to improve intra-class diversity and gain better feature representation, which serves the text classification task.TDACL first improves intra-class diversity using keyword-based data augmentation in the input layer and interpolated data augmentation in the hidden layer.Afterward, a compact intra-class feature space is learned using contrast loss. Experiments in low-resource environments show that our approach is more effective in improving the performance of the model when training data is scarce. We conducted experiments on four different datasets, SST-2, CR, TREC, and PC, achieving 92.10%, 92.54%, 97.61%, and 95.27% accuracy, respectively, outperforming the robust baseline approach.
Databáze: OpenAIRE