Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Agrawal, Tejas"'
Large vision-language models (VLMs) are shown to learn rich joint image-text representations enabling high performances in relevant downstream tasks. However, they fail to showcase their quantitative understanding of objects, and they lack good count
Externí odkaz:
http://arxiv.org/abs/2406.03586
Autor:
Mollah, Mohammad Y.A., Pathak, Saurabh R., Patil, Prashanth K., Vayuvegula, Madhavi, Agrawal, Tejas S., Gomes, Jewel A.G., Kesmez, Mehmet, Cocke, David L.
Publikováno v:
In Journal of Hazardous Materials 2004 109(1):165-171