Zobrazeno 1 - 10
of 121
pro vyhledávání: '"BALLAN, LAMBERTO"'
Referring Expression Comprehension (REC) aims to identify a particular object in a scene by a natural language expression, and is an important topic in visual language understanding. State-of-the-art methods for this task are based on deep learning,
Externí odkaz:
http://arxiv.org/abs/2411.14807
Publikováno v:
IEEE Transactions on Intelligent Transportation Systems (Early Access) 2024
High-quality spatiotemporal traffic data is crucial for intelligent transportation systems (ITS) and their data-driven applications. Inevitably, the issue of missing data caused by various disturbances threatens the reliability of data acquisition. R
Externí odkaz:
http://arxiv.org/abs/2410.15248
Autor:
Scofano, Luca, Sampieri, Alessio, Campari, Tommaso, Sacco, Valentino, Spinelli, Indro, Ballan, Lamberto, Galasso, Fabio
The success of collaboration between humans and robots in shared environments relies on the robot's real-time adaptation to human motion. Specifically, in Social Navigation, the agent should be close enough to assist but ready to back up to let the h
Externí odkaz:
http://arxiv.org/abs/2404.11327
Using only image-sentence pairs, weakly-supervised visual-textual grounding aims to learn region-phrase correspondences of the respective entity mentions. Compared to the supervised approach, learning is more difficult since bounding boxes and textua
Externí odkaz:
http://arxiv.org/abs/2305.10913
Long-term trajectory forecasting is an important and challenging problem in the fields of computer vision, machine learning, and robotics. One fundamental difficulty stands in the evolution of the trajectory that becomes more and more uncertain and u
Externí odkaz:
http://arxiv.org/abs/2305.08553
Learning how to navigate among humans in an occluded and spatially constrained indoor environment, is a key ability required to embodied agent to be integrated into our society. In this paper, we propose an end-to-end architecture that exploits Proxi
Externí odkaz:
http://arxiv.org/abs/2212.00767
Human intention prediction is a growing area of research where an activity in a video has to be anticipated by a vision-based system. To this end, the model creates a representation of the past, and subsequently, it produces future hypotheses about u
Externí odkaz:
http://arxiv.org/abs/2210.14714
Vision Transformers (ViTs) enabled the use of the transformer architecture on vision tasks showing impressive performances when trained on big datasets. However, on relatively small datasets, ViTs are less accurate given their lack of inductive bias.
Externí odkaz:
http://arxiv.org/abs/2206.00481
Autor:
Chiara, Luigi Filippo, Coscia, Pasquale, Das, Sourav, Calderara, Simone, Cucchiara, Rita, Ballan, Lamberto
Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications. This challenging task typically requires knowledge about past motion, the environment and likely destination are
Externí odkaz:
http://arxiv.org/abs/2204.11561
Multi-modal transformer with language modality distillation for early pedestrian action anticipation
Publikováno v:
In Computer Vision and Image Understanding December 2024 249