Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Woo, Jongbhin"'
Video Temporal Grounding (VTG) aims to identify visual frames in a video clip that match text queries. Recent studies in VTG employ cross-attention to correlate visual frames and text queries as individual token sequences. However, these approaches o
Externí odkaz:
http://arxiv.org/abs/2410.13598