Leveraging Text Representation and Face-head Tracking for Long-form Multimodal Semantic Relation Understanding

Autor: Raksha Ramesh, Vishal Anand, Zifan Chen, Yifei Dong, Yun Chen, Ching-Yung Lin
Rok vydání: 2022
Zdroj: Proceedings of the 30th ACM International Conference on Multimedia.
DOI: 10.1145/3503161.3551610
Databáze: OpenAIRE