Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Oei, Keyne"'
Robust frame-wise embeddings are essential to perform video analysis and understanding tasks. We present a self-supervised method for representation learning based on aligning temporal video sequences. Our framework uses a transformer-based encoder t
Externí odkaz:
http://arxiv.org/abs/2409.04607