Abstrakt: |
Video summarization aims to select key frames or key shots to create summaries for fast retrieval, compression, and efficient browsing of videos. Graph neural networks efficiently capture information about graph nodes and their neighbors, but ignore the dynamic dependencies between nodes. To address this challenge, we propose an innovative Adaptive Graph Convolutional Adjacency Matrix Network (TAMGCN), leveraging the attention mechanism to dynamically adjust dependencies between graph nodes. Specifically, we first segment shots and extract features of each frame, then compute the representative features of each shot. Subsequently, we utilize the attention mechanism to dynamically adjust the adjacency matrix of the graph convolutional network to better capture the dynamic dependencies between graph nodes. Finally, we fuse temporal features extracted by Bi-directional Long Short-Term Memory network with structural features extracted by the graph convolutional network to generate high-quality summaries. Extensive experiments are conducted on two benchmark datasets, TVSum and SumMe, yielding F1-scores of 60.8% and 53.2%, respectively. Experimental results demonstrate that our method outperforms most state-of-the-art video summarization techniques. [ABSTRACT FROM AUTHOR] |