Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Weyand, Tobias"'
Autor:
Gundavarapu, Nitesh Bharadwaj, Friedman, Luke, Goyal, Raghav, Hegde, Chaitra, Agustsson, Eirikur, Waghmare, Sagar M., Sirotenko, Mikhail, Yang, Ming-Hsuan, Weyand, Tobias, Gong, Boqing, Sigal, Leonid
Video understanding has witnessed significant progress with recent video foundation models demonstrating strong performance owing to self-supervised pre-training objectives; Masked Autoencoders (MAE) being the design of choice. Nevertheless, the majo
Externí odkaz:
http://arxiv.org/abs/2411.13683
Autor:
Zhao, Long, Gundavarapu, Nitesh B., Yuan, Liangzhe, Zhou, Hao, Yan, Shen, Sun, Jennifer J., Friedman, Luke, Qian, Rui, Weyand, Tobias, Zhao, Yue, Hornung, Rachel, Schroff, Florian, Yang, Ming-Hsuan, Ross, David A., Wang, Huisheng, Adam, Hartwig, Sirotenko, Mikhail, Liu, Ting, Gong, Boqing
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips
Externí odkaz:
http://arxiv.org/abs/2402.13217
Autor:
Yuan, Liangzhe, Gundavarapu, Nitesh Bharadwaj, Zhao, Long, Zhou, Hao, Cui, Yin, Jiang, Lu, Yang, Xuan, Jia, Menglin, Weyand, Tobias, Friedman, Luke, Sirotenko, Mikhail, Wang, Huisheng, Schroff, Florian, Adam, Hartwig, Yang, Ming-Hsuan, Liu, Ting, Gong, Boqing
We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition,temporal localization, and spatiotemporal localization), eight
Externí odkaz:
http://arxiv.org/abs/2307.03166
Autor:
Kim, Zu, Araujo, André, Cao, Bingyi, Askew, Cam, Sim, Jack, Green, Mike, Yilla, N'Mah Fodiatu, Weyand, Tobias
There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by
Externí odkaz:
http://arxiv.org/abs/2206.01326
Autor:
Kim, Zu, Araujo, André, Cao, Bingyi, Askew, Cam, Sim, Jack, Green, Mike, Yilla, N'Mah Fodiatu, Weyand, Tobias
We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead
Externí odkaz:
http://arxiv.org/abs/2108.08874
Autor:
Thames, Quin, Karpur, Arjun, Norris, Wade, Xia, Fangting, Panait, Liviu, Weyand, Tobias, Sim, Jack
Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field
Externí odkaz:
http://arxiv.org/abs/2103.03375
While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We intr
Externí odkaz:
http://arxiv.org/abs/2004.01804
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has
Externí odkaz:
http://arxiv.org/abs/1808.02130
Autor:
Howard, Andrew G., Zhu, Menglong, Chen, Bo, Kalenichenko, Dmitry, Wang, Weijun, Weyand, Tobias, Andreetto, Marco, Adam, Hartwig
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introd
Externí odkaz:
http://arxiv.org/abs/1704.04861
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a l
Externí odkaz:
http://arxiv.org/abs/1612.06321