Výsledky vyhledávání - "Vijayanarasimhan, Sudheendra"

Report

IC3: Image Captioning by Committee Consensus

Autor: Chan, David M., Myers, Austin, Vijayanarasimhan, Sudheendra, Ross, David A., Canny, John

If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions

Externí odkaz: http://arxiv.org/abs/2302.01328

Zobrazit plný text záznamu

Report

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Autor: Rathod, Vivek, Seybold, Bryan, Vijayanarasimhan, Sudheendra, Myers, Austin, Gu, Xiuye, Birodkar, Vighnesh, Ross, David A.

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes. We present a simple, yet effective strategy for open-vocabulary temporal action detection utilizing pretrained image-text co-embeddings. Despite being trai

Externí odkaz: http://arxiv.org/abs/2212.10596

Zobrazit plný text záznamu

Report

Distribution Aware Metrics for Conditional Natural Language Generation

Autor: Chan, David M, Ni, Yiming, Ross, David A, Vijayanarasimhan, Sudheendra, Myers, Austin, Canny, John

Traditional automated metrics for evaluating conditional natural language generation use pairwise comparisons between a single generated text and the best-matching gold-standard ground truth text. When multiple ground truths are available, scores are

Externí odkaz: http://arxiv.org/abs/2209.07518

Zobrazit plný text záznamu

Report

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

Autor: Chan, David M., Myers, Austin, Vijayanarasimhan, Sudheendra, Ross, David A., Seybold, Bryan, Canny, John F.

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world. Most visual descrip

Externí odkaz: http://arxiv.org/abs/2205.06253

Zobrazit plný text záznamu

Report

Active Learning for Video Description With Cluster-Regularized Ensemble Ranking

Autor: Chan, David M., Vijayanarasimhan, Sudheendra, Ross, David A., Canny, John

Automatic video captioning aims to train models to generate text descriptions for all segments in a video, however, the most effective approaches require large amounts of manual annotation which is slow and expensive. Active learning is a promising w

Externí odkaz: http://arxiv.org/abs/2007.13913

Zobrazit plný text záznamu

Report

Rethinking the Faster R-CNN Architecture for Temporal Action Localization

Autor: Chao, Yu-Wei, Vijayanarasimhan, Sudheendra, Seybold, Bryan, Ross, David A., Deng, Jia, Sukthankar, Rahul

We propose TAL-Net, an improved approach to temporal action localization in video that is inspired by the Faster R-CNN object detection framework. TAL-Net addresses three key shortcomings of existing approaches: (1) we improve receptive field alignme

Externí odkaz: http://arxiv.org/abs/1804.07667

Zobrazit plný text záznamu

Report

End-to-End Learning of Semantic Grasping

Autor: Jang, Eric, Vijayanarasimhan, Sudheendra, Pastor, Peter, Ibarz, Julian, Levine, Sergey

We consider the task of semantic robotic grasping, in which a robot picks up an object of a user-specified class using only monocular images. Inspired by the two-stream hypothesis of visual reasoning, we present a semantic grasping framework that lea

Externí odkaz: http://arxiv.org/abs/1707.01932

Zobrazit plný text záznamu

Report

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Autor: Gu, Chunhui, Sun, Chen, Ross, David A., Vondrick, Carl, Pantofaru, Caroline, Li, Yeqing, Vijayanarasimhan, Sudheendra, Toderici, George, Ricco, Susanna, Sukthankar, Rahul, Schmid, Cordelia, Malik, Jitendra

This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.5

Externí odkaz: http://arxiv.org/abs/1705.08421

Zobrazit plný text záznamu

Report

The Kinetics Human Action Video Dataset

Autor: Kay, Will, Carreira, Joao, Simonyan, Karen, Zhang, Brian, Hillier, Chloe, Vijayanarasimhan, Sudheendra, Viola, Fabio, Green, Tim, Back, Trevor, Natsev, Paul, Suleyman, Mustafa, Zisserman, Andrew

We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human

Externí odkaz: http://arxiv.org/abs/1705.06950

Zobrazit plný text záznamu

Report

Motion Prediction Under Multimodality with Conditional Stochastic Networks

Autor: Fragkiadaki, Katerina, Huang, Jonathan, Alemi, Alex, Vijayanarasimhan, Sudheendra, Ricco, Susanna, Sukthankar, Rahul

Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the fo

Externí odkaz: http://arxiv.org/abs/1705.02082

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání