Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Xian, Xingyuan"'
With the rise of multimodal large language models, accurately extracting and understanding textual information from video content, referred to as video based optical character recognition (Video OCR), has become a crucial capability. This paper intro
Externí odkaz:
http://arxiv.org/abs/2412.20613