Zobrazeno 1 - 10
of 101
pro vyhledávání: '"Graf, Hans Peter"'
Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and may captur
Externí odkaz:
http://arxiv.org/abs/2305.09539
Autor:
Zhou, Honglu, Kadav, Asim, Shamsian, Aviv, Geng, Shijie, Lai, Farley, Zhao, Long, Liu, Ting, Kapadia, Mubbasir, Graf, Hans Peter
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic co
Externí odkaz:
http://arxiv.org/abs/2112.05892
Autor:
Zhou, Honglu, Kadav, Asim, Lai, Farley, Niculescu-Mizil, Alexandru, Min, Martin Renqiang, Kapadia, Mubbasir, Graf, Hans Peter
This paper considers the problem of spatiotemporal object-centric reasoning in videos. Central to our approach is the notion of object permanence, i.e., the ability to reason about the location of objects as they move through the video while being oc
Externí odkaz:
http://arxiv.org/abs/2103.10574
Autor:
Cosatto, Eric, Gerard, Kyle, Graf, Hans-Peter, Ogura, Maki, Kiyuna, Tomoharu, Hatanaka, Kanako C., Matsuno, Yoshihiro, Hatanaka, Yutaka
We propose a method to accurately obtain the ratio of tumor cells over an entire histological slide. We use deep fully convolutional neural network models trained to detect and classify cells on images of H&E-stained tissue sections. Pathologists' la
Externí odkaz:
http://arxiv.org/abs/2101.11731
We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision. Specifically, we exploit the benefits of some readily accessible supervisory signals from input
Externí odkaz:
http://arxiv.org/abs/2005.11437
Pose tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames of a video. However, existing pose tracking methods are unable to accurately model temporal relationships
Externí odkaz:
http://arxiv.org/abs/1912.02323
Localizing moments in untrimmed videos via language queries is a new and interesting task that requires the ability to accurately ground language into video. Previous works have approached this task by processing the entire video, often more than onc
Externí odkaz:
http://arxiv.org/abs/1904.09936
Autor:
Tokuyama, Naoto, Saito, Akira, Muraoka, Ryu, Matsubara, Shuya, Hashimoto, Takeshi, Satake, Naoya, Matsubayashi, Jun, Nagao, Toshitaka, Mirza, Aashiq H., Graf, Hans-Peter, Cosatto, Eric, Wu, Chin-Lee, Kuroda, Masahiko, Ohno, Yoshio
Publikováno v:
In Modern Pathology April 2022 35(4):533-538
We address the problem of video captioning by grounding language generation on object interactions in the video. Existing work mostly focuses on overall scene understanding with often limited or no emphasis on object interactions to address the probl
Externí odkaz:
http://arxiv.org/abs/1711.06354
Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwi
Externí odkaz:
http://arxiv.org/abs/1711.06330