Popis: |
The problem of data incompleteness has become an intractable problem for network representation learning(NRL) methods,which makes existing NRL algorithms fail to achieve the expected results.Despite numerous efforts have done to solve the issue,most of previous methods mainly focused on the lack of label information,and rarely consider data imbalance phenomenon,especially the completely imbalance problem that a certain class labels are completely missing.Learning algorithms to solve such problems are still explored,for example,some neighborhood feature aggregation process prefers to focus on network structure information,while disregarding relationships between attribute features and semantic features,of which utilization may enhance representation results.To address the above problems,a semantic information enhanced network embedding with completely imbalanced labels(SECT)method that combines attribute features and structural features is proposed in this paper.Firstly,SECT introduces attention mechanism in the supervised learning for obtaining the semantic information vector on precondition of considering the relationship between the attribute space and the semantic space.Secondly,a variational autoencoder is applied to extract structural features under an unsupervised mode to enhance the robustness of the algorithm.Finally,both semantic and structural information are integrated in the embedded space.Compared with two state-of-the-art algorithms,the node classification results on public data sets Cora and Citeseer indicate the network vector obtained by SECT algorithm outperforms others and increases by 0.86%~1.97% under Mirco-F1.As well as the node visualization results exhibit that compared with other algorithms,the vector distances among different-class clusters obtained by SECT are larger,the clusters of same class are more compact,and the class boundaries are more obvious.All these experimental results demonstrate the effectiveness of SECT,which mainly benefited from a better fusion of semantic information in the low-dimensional embedding space,thus extremely improves the performance of node classification tasks under completely imbalanced labels. |