Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Penamakuri, Abhirama Subramanyam"'
We revisit knowledge-aware text-based visual question answering, also known as Text-KVQA, in the light of modern advancements in large multimodal models (LMMs), and make the following contributions: (i) We propose VisTEL - a principled approach to pe
Externí odkaz:
http://arxiv.org/abs/2410.19144
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context. For such a setting, a model must first retrieve relevant images from the pool and answer the question f
Externí odkaz:
http://arxiv.org/abs/2306.16713
Autor:
Gatti, Prajwal, Penamakuri, Abhirama Subramanyam, Teotia, Revant, Mishra, Anand, Sengupta, Shubhashis, Ramnani, Roshni
One characteristic that makes humans superior to modern artificially intelligent models is the ability to interpret images beyond what is visually apparent. Consider the following two natural language search queries - (i) "a queue of customers patien
Externí odkaz:
http://arxiv.org/abs/2210.08554