Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Ullegaddi, Prashant"'
Recent advances in multimodal LLMs, have led to several video-text models being proposed for critical video-related tasks. However, most of the previous works support visual input only, essentially muting the audio signal in the video. Few models tha
Externí odkaz:
http://arxiv.org/abs/2407.15046
Publikováno v:
Proceedings of the 20th ACM International Conference: Information & Knowledge Management; Oct2011, p2065-2068, 4p