Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Kadav, Asim"'
The extraction of lung lesion information from clinical and medical imaging reports is crucial for research on and clinical care of lung-related diseases. Large language models (LLMs) can be effective at interpreting unstructured text in reports, but
Externí odkaz:
http://arxiv.org/abs/2406.18027
Action recognition is an important problem that requires identifying actions in video by learning complex interactions across scene actors and objects. However, modern deep-learning based networks often require significant computation, and may captur
Externí odkaz:
http://arxiv.org/abs/2305.09539
Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examp
Externí odkaz:
http://arxiv.org/abs/2201.07989
The recent success of deep learning applications has coincided with those widely available powerful computational resources for training sophisticated machine learning models with huge datasets. Nonetheless, training large models such as convolutiona
Externí odkaz:
http://arxiv.org/abs/2112.15317
Autor:
Zhou, Honglu, Kadav, Asim, Shamsian, Aviv, Geng, Shijie, Lai, Farley, Zhao, Long, Liu, Ting, Kapadia, Mubbasir, Graf, Hans Peter
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic co
Externí odkaz:
http://arxiv.org/abs/2112.05892
Autor:
Han, Ligong, Min, Martin Renqiang, Stathopoulos, Anastasis, Tian, Yu, Gao, Ruijiang, Kadav, Asim, Metaxas, Dimitris
Conditional Generative Adversarial Networks (cGANs) extend the standard unconditional GAN framework to learning joint data-label distributions from samples, and have been established as powerful generative models capable of generating high-fidelity i
Externí odkaz:
http://arxiv.org/abs/2108.09016
Autor:
Zhou, Honglu, Kadav, Asim, Lai, Farley, Niculescu-Mizil, Alexandru, Min, Martin Renqiang, Kapadia, Mubbasir, Graf, Hans Peter
This paper considers the problem of spatiotemporal object-centric reasoning in videos. Central to our approach is the notion of object permanence, i.e., the ability to reason about the location of objects as they move through the video while being oc
Externí odkaz:
http://arxiv.org/abs/2103.10574
We propose a sequential variational autoencoder to learn disentangled representations of sequential data (e.g., videos and audios) under self-supervision. Specifically, we exploit the benefits of some readily accessible supervisory signals from input
Externí odkaz:
http://arxiv.org/abs/2005.11437
Pose tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames of a video. However, existing pose tracking methods are unable to accurately model temporal relationships
Externí odkaz:
http://arxiv.org/abs/1912.02323
In this paper, we introduce a contextual grounding approach that captures the context in corresponding text entities and image regions to improve the grounding accuracy. Specifically, the proposed architecture accepts pre-trained text token embedding
Externí odkaz:
http://arxiv.org/abs/1911.02133