Výsledky vyhledávání - "SATO, Yoichi"

Report

Pre-Training for 3D Hand Pose Estimation with Contrastive Learning on Large-Scale Hand Images in the Wild

Autor: Lin, Nie, Ohkawa, Takehiko, Zhang, Mingfang, Huang, Yifei, Furuta, Ryosuke, Sato, Yoichi

We present a contrastive learning framework based on in-the-wild hand images tailored for pre-training 3D hand pose estimators, dubbed HandCLR. Pre-training on large-scale images achieves promising results in various tasks, but prior 3D hand pose pre

Externí odkaz: http://arxiv.org/abs/2409.09714

Zobrazit plný text záznamu

Report

WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding

Autor: Kong, Quan, Kawana, Yuki, Saini, Rajat, Kumar, Ashutosh, Pan, Jingjing, Gu, Ta, Ozao, Yohei, Opra, Balazs, Anastasiu, David C., Sato, Yoichi, Kobori, Norimasa

In this paper, we address the challenge of fine-grained video event understanding in traffic scenarios, vital for autonomous driving and safety. Traditional datasets focus on driver or vehicle behavior, often neglecting pedestrian perspectives. To fi

Externí odkaz: http://arxiv.org/abs/2407.15350

Zobrazit plný text záznamu

Report

ActionVOS: Actions as Prompts for Video Object Segmentation

Autor: Ouyang, Liangyang, Liu, Ruicong, Huang, Yifei, Furuta, Ryosuke, Sato, Yoichi

Delving into the realm of egocentric vision, the advancement of referring video object segmentation (RVOS) stands as pivotal in understanding human activities. However, existing RVOS task primarily relies on static attributes such as object names to

Externí odkaz: http://arxiv.org/abs/2407.07402

Zobrazit plný text záznamu

Report

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

Autor: Zhang, Mingfang, Huang, Yifei, Liu, Ruicong, Sato, Yoichi

Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric

Externí odkaz: http://arxiv.org/abs/2407.06628

Zobrazit plný text záznamu

Report

Learning Object States from Actions via Large Language Models

Autor: Tateno, Masatoshi, Yagi, Takuma, Furuta, Ryosuke, Sato, Yoichi

Temporally localizing the presence of object states in videos is crucial in understanding human activities beyond actions and objects. This task has suffered from a lack of training data due to object states' inherent ambiguity and variety. To avoid

Externí odkaz: http://arxiv.org/abs/2405.01090

Zobrazit plný text záznamu

Report

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.

Externí odkaz: http://arxiv.org/abs/2403.16428

Zobrazit plný text záznamu

Report

Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation

Autor: Liu, Ruicong, Ohkawa, Takehiko, Zhang, Mingfang, Sato, Yoichi

The pursuit of accurate 3D hand pose estimation stands as a keystone for understanding human activity in the realm of egocentric vision. The majority of existing estimation methods still rely on single-view images as input, leading to potential limit

Externí odkaz: http://arxiv.org/abs/2403.04381

Zobrazit plný text záznamu

Report

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

Autor: Yagi, Takuma, Ohashi, Misaki, Huang, Yifei, Furuta, Ryosuke, Adachi, Shungo, Mitsuyama, Toutai, Sato, Yoichi

In the development of science, accurate and reproducible documentation of the experimental process is crucial. Automatic recognition of the actions in experiments from videos would help experimenters by complementing the recording of experiments. Tow

Externí odkaz: http://arxiv.org/abs/2402.00293

Zobrazit plný text záznamu

Report

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Autor: Grauman, Kristen, Westbury, Andrew, Torresani, Lorenzo, Kitani, Kris, Malik, Jitendra, Afouras, Triantafyllos, Ashutosh, Kumar, Baiyya, Vijay, Bansal, Siddhant, Boote, Bikram, Byrne, Eugene, Chavis, Zach, Chen, Joya, Cheng, Feng, Chu, Fu-Jen, Crane, Sean, Dasgupta, Avijit, Dong, Jing, Escobar, Maria, Forigua, Cristhian, Gebreselasie, Abrham, Haresh, Sanjay, Huang, Jing, Islam, Md Mohaiminul, Jain, Suyog, Khirodkar, Rawal, Kukreja, Devansh, Liang, Kevin J, Liu, Jia-Wei, Majumder, Sagnik, Mao, Yongsen, Martin, Miguel, Mavroudi, Effrosyni, Nagarajan, Tushar, Ragusa, Francesco, Ramakrishnan, Santhosh Kumar, Seminara, Luigi, Somayazulu, Arjun, Song, Yale, Su, Shan, Xue, Zihui, Zhang, Edward, Zhang, Jinxu, Castillo, Angela, Chen, Changan, Fu, Xinzhu, Furuta, Ryosuke, Gonzalez, Cristina, Gupta, Prince, Hu, Jiabo, Huang, Yifei, Huang, Yiming, Khoo, Weslie, Kumar, Anush, Kuo, Robert, Lakhavani, Sach, Liu, Miao, Luo, Mi, Luo, Zhengyi, Meredith, Brighid, Miller, Austin, Oguntola, Oluwatumininu, Pan, Xiaqing, Peng, Penny, Pramanick, Shraman, Ramazanova, Merey, Ryan, Fiona, Shan, Wei, Somasundaram, Kiran, Song, Chenan, Southerland, Audrey, Tateno, Masatoshi, Wang, Huiyu, Wang, Yuchen, Yagi, Takuma, Yan, Mingfei, Yang, Xitong, Yu, Zecheng, Zha, Shengxin Cindy, Zhao, Chen, Zhao, Ziwei, Zhu, Zhifan, Zhuo, Jeff, Arbelaez, Pablo, Bertasius, Gedas, Crandall, David, Damen, Dima, Engel, Jakob, Farinella, Giovanni Maria, Furnari, Antonino, Ghanem, Bernard, Hoffman, Judy, Jawahar, C. V., Newcombe, Richard, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Savva, Manolis, Shi, Jianbo, Shou, Mike Zheng, Wray, Michael

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike re

Externí odkaz: http://arxiv.org/abs/2311.18259

Zobrazit plný text záznamu

Report

Generative Hierarchical Temporal Transformer for Hand Pose and Action Modeling

Autor: Wen, Yilin, Pan, Hao, Ohkawa, Takehiko, Yang, Lei, Pan, Jia, Sato, Yoichi, Komura, Taku, Wang, Wenping

We present a novel unified framework that concurrently tackles recognition and future prediction for human hand pose and action modeling. Previous works generally provide isolated solutions for either recognition or prediction, which not only increas

Externí odkaz: http://arxiv.org/abs/2311.17366

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání