Zobrazeno 1 - 10
of 875
pro vyhledávání: '"SAITO, HIDEO"'
We propose a method for dense depth estimation from an event stream generated when sweeping the focal plane of the driving lens attached to an event camera. In this method, a depth map is inferred from an ``event focal stack'' composed of the event s
Externí odkaz:
http://arxiv.org/abs/2412.08120
This paper jointly addresses three key limitations in conventional pedestrian trajectory forecasting: pedestrian perception errors, real-world data collection costs, and person ID annotation costs. We propose a novel framework, RealTraj, that enhance
Externí odkaz:
http://arxiv.org/abs/2411.17376
Tactile perception is vital, especially when distinguishing visually similar objects. We propose an approach to incorporate tactile data into a Vision-Language Model (VLM) for visuo-tactile zero-shot object recognition. Our approach leverages the zer
Externí odkaz:
http://arxiv.org/abs/2409.09276
A crowd density forecasting task aims to predict how the crowd density map will change in the future from observed past crowd density maps. However, the past crowd density maps are often incomplete due to the miss-detection of pedestrians, and it is
Externí odkaz:
http://arxiv.org/abs/2407.14725
Event cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remark
Externí odkaz:
http://arxiv.org/abs/2406.14978
Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures,
Externí odkaz:
http://arxiv.org/abs/2406.03095
Predicting future human behavior from egocentric videos is a challenging but critical task for human intention understanding. Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactio
Externí odkaz:
http://arxiv.org/abs/2405.20030
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition
Externí odkaz:
http://arxiv.org/abs/2405.19917
Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical pha
Externí odkaz:
http://arxiv.org/abs/2405.19644
Surgical tool detection is essential for analyzing and evaluating minimally invasive surgery videos. Current approaches are mostly based on supervised methods that require large, fully instance-level labels (i.e., bounding boxes). However, large imag
Externí odkaz:
http://arxiv.org/abs/2401.02791