Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Avraham, Elad Ben"'
Autor:
Ganz, Roy, Kittenplon, Yair, Aberdam, Aviad, Avraham, Elad Ben, Nuriel, Oren, Mazor, Shai, Litman, Ron
Vision-Language (VL) models have gained significant research focus, enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder, a Large Language Model (LLM), and a projection module that aligns visua
Externí odkaz:
http://arxiv.org/abs/2402.05472
Autor:
Blau, Tsachi, Fogel, Sharon, Ronen, Roi, Golts, Alona, Ganz, Roy, Avraham, Elad Ben, Aberdam, Aviad, Tsiper, Shahar, Litman, Ron
The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundre
Externí odkaz:
http://arxiv.org/abs/2401.03411