Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Voutharoja, Bhanu Prakash"'
Image-to-recipe retrieval is a challenging vision-to-language task of significant practical value. The main challenge of the task lies in the ultra-high redundancy in the long recipe and the large variation reflected in both food item combination and
Externí odkaz:
http://arxiv.org/abs/2305.11327
Automatic radiology report generation is challenging as medical images or reports are usually similar to each other due to the common content of anatomy. This makes a model hard to capture the uniqueness of individual images and is prone to producing
Externí odkaz:
http://arxiv.org/abs/2305.07176
Recent works on form understanding mostly employ multimodal transformers or large-scale pre-trained language models. These models need ample data for pre-training. In contrast, humans can usually identify key-value pairings from a form only by lookin
Externí odkaz:
http://arxiv.org/abs/2305.04460
In this paper, we propose a novel pipeline that leverages language foundation models for temporal sequential pattern mining, such as for human mobility forecasting tasks. For example, in the task of predicting Place-of-Interest (POI) customer flows,
Externí odkaz:
http://arxiv.org/abs/2209.05479
This six-volume set of LNCS 14187, 14188, 14189, 14190, 14191 and 14192 constitutes the refereed proceedings of the 17th International Conference on Document Analysis and Recognition, ICDAR 2021, held in San José, CA, USA, in August 2023. The 53 f