Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Rimchala, Joy"'
Autor:
Xu, Zhiyang, Liu, Minqian, Shen, Ying, Rimchala, Joy, Zhang, Jiaxin, Wang, Qifan, Cheng, Yu, Huang, Lifu
Recent advancements in Vision-Language Models (VLMs) have led to the development of Vision-Language Generalists (VLGs) capable of understanding and generating interleaved images and text. Despite these advances, VLGs still struggle to follow user ins
Externí odkaz:
http://arxiv.org/abs/2407.03604
Autor:
Liu, Minqian, Xu, Zhiyang, Lin, Zihao, Ashby, Trevor, Rimchala, Joy, Zhang, Jiaxin, Huang, Lifu
Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in
Externí odkaz:
http://arxiv.org/abs/2406.14643
The performance of optical character recognition (OCR) heavily relies on document image quality, which is crucial for automatic document processing and document intelligence. However, most existing document enhancement methods require supervised data
Externí odkaz:
http://arxiv.org/abs/2311.09625
Current research in form understanding predominantly relies on large pre-trained language models, necessitating extensive data for pre-training. However, the importance of layout structure (i.e., the spatial relationship between the entity blocks in
Externí odkaz:
http://arxiv.org/abs/2305.14590