Zobrazeno 1 - 10
of 486
pro vyhledávání: '"Girshick, Ross"'
Autor:
Deitke, Matt, Clark, Christopher, Lee, Sangho, Tripathi, Rohun, Yang, Yue, Park, Jae Sung, Salehi, Mohammadreza, Muennighoff, Niklas, Lo, Kyle, Soldaini, Luca, Lu, Jiasen, Anderson, Taira, Bransom, Erin, Ehsani, Kiana, Ngo, Huong, Chen, YenSung, Patel, Ajay, Yatskar, Mark, Callison-Burch, Chris, Head, Andrew, Hendrix, Rose, Bastani, Favyen, VanderBilt, Eli, Lambert, Nathan, Chou, Yvonne, Chheda, Arnavi, Sparks, Jenna, Skjonsberg, Sam, Schmitz, Michael, Sarnat, Aaron, Bischoff, Byron, Walsh, Pete, Newell, Chris, Wolters, Piper, Gupta, Tanmay, Zeng, Kuo-Hao, Borchardt, Jon, Groeneveld, Dirk, Dumas, Jen, Nam, Crystal, Lebrecht, Sophie, Wittlif, Caitlin, Schoenick, Carissa, Michel, Oscar, Krishna, Ranjay, Weihs, Luca, Smith, Noah A., Hajishirzi, Hannaneh, Girshick, Ross, Farhadi, Ali, Kembhavi, Aniruddha
Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones. As a result, the
Externí odkaz:
http://arxiv.org/abs/2409.17146
Autor:
Ravi, Nikhila, Gabeur, Valentin, Hu, Yuan-Ting, Hu, Ronghang, Ryali, Chaitanya, Ma, Tengyu, Khedr, Haitham, Rädle, Roman, Rolland, Chloe, Gustafson, Laura, Mintun, Eric, Pan, Junting, Alwala, Kalyan Vasudev, Carion, Nicolas, Wu, Chao-Yuan, Girshick, Ross, Dollár, Piotr, Feichtenhofer, Christoph
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation
Externí odkaz:
http://arxiv.org/abs/2408.00714
Autor:
Zeng, Kuo-Hao, Zhang, Zichen, Ehsani, Kiana, Hendrix, Rose, Salvador, Jordi, Herrasti, Alvaro, Girshick, Ross, Kembhavi, Aniruddha, Weihs, Luca
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses
Externí odkaz:
http://arxiv.org/abs/2406.20083
Autor:
Kirillov, Alexander, Mintun, Eric, Ravi, Nikhila, Mao, Hanzi, Rolland, Chloe, Gustafson, Laura, Xiao, Tete, Whitehead, Spencer, Berg, Alexander C., Lo, Wan-Yen, Dollár, Piotr, Girshick, Ross
We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M l
Externí odkaz:
http://arxiv.org/abs/2304.02643
Autor:
Singh, Mannat, Duval, Quentin, Alwala, Kalyan Vasudev, Fan, Haoqi, Aggarwal, Vaibhav, Adcock, Aaron, Joulin, Armand, Dollár, Piotr, Feichtenhofer, Christoph, Girshick, Ross, Girdhar, Rohit, Misra, Ishan
This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images
Externí odkaz:
http://arxiv.org/abs/2303.13496
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection. This design enables the original ViT architecture to be fine-tuned for object detection without needing to redesign a hierarchical backbone fo
Externí odkaz:
http://arxiv.org/abs/2203.16527
Autor:
Singh, Mannat, Gustafson, Laura, Adcock, Aaron, Reis, Vinicius de Freitas, Gedik, Bugra, Kosaraju, Raj Prateek, Mahajan, Dhruv, Girshick, Ross, Dollár, Piotr, van der Maaten, Laurens
Model pre-training is a cornerstone of modern visual recognition systems. Although fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent studies suggest that large-scale weakly supervised pre-training can outp
Externí odkaz:
http://arxiv.org/abs/2201.08371
Object detection is a central downstream task used to test if pre-trained network parameters confer benefits, such as improved accuracy or training speed. The complexity of object detection methods can make this benchmarking non-trivial when new arch
Externí odkaz:
http://arxiv.org/abs/2111.11429
Autor:
Fan, Haoqi, Murrell, Tullie, Wang, Heng, Alwala, Kalyan Vasudev, Li, Yanghao, Li, Yilei, Xiong, Bo, Ravi, Nikhila, Li, Meng, Yang, Haichuan, Malik, Jitendra, Girshick, Ross, Feiszli, Matt, Adcock, Aaron, Lo, Wan-Yen, Feichtenhofer, Christoph
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and
Externí odkaz:
http://arxiv.org/abs/2111.09887
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. Firs
Externí odkaz:
http://arxiv.org/abs/2111.06377