Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Gritsenko, Alexey"'
Autor:
Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, Zhai, Xiaohua
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong
Externí odkaz:
http://arxiv.org/abs/2407.07726
As foundation models become more popular, there is a growing need to efficiently finetune them for downstream tasks. Although numerous adaptation methods have been proposed, they are designed to be efficient only in terms of how many parameters are t
Externí odkaz:
http://arxiv.org/abs/2402.02887
Autor:
Heigold, Georg, Minderer, Matthias, Gritsenko, Alexey, Bewley, Alex, Keysers, Daniel, Lučić, Mario, Yu, Fisher, Kipf, Thomas
We present an architecture and a training recipe that adapts pre-trained open-world image models to localization in videos. Understanding the open visual world (without being constrained by fixed label spaces) is crucial for many real-world vision ta
Externí odkaz:
http://arxiv.org/abs/2308.11093
Autor:
Dehghani, Mostafa, Mustafa, Basil, Djolonga, Josip, Heek, Jonathan, Minderer, Matthias, Caron, Mathilde, Steiner, Andreas, Puigcerver, Joan, Geirhos, Robert, Alabdulmohsin, Ibrahim, Oliver, Avital, Padlewski, Piotr, Gritsenko, Alexey, Lučić, Mario, Houlsby, Neil
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexibl
Externí odkaz:
http://arxiv.org/abs/2307.06304
Open-vocabulary object detection has benefited greatly from pretrained vision-language models, but is still limited by the amount of available detection training data. While detection training data can be expanded by using Web image-text pairs as wea
Externí odkaz:
http://arxiv.org/abs/2306.09683
Autor:
Gritsenko, Alexey, Xiong, Xuehan, Djolonga, Josip, Dehghani, Mostafa, Sun, Chen, Lučić, Mario, Schmid, Cordelia, Arnab, Anurag
The most performant spatio-temporal action localisation models use external person proposals and complex external memory banks. We propose a fully end-to-end, purely-transformer based model that directly ingests an input video, and outputs tubelets -
Externí odkaz:
http://arxiv.org/abs/2304.12160
Autor:
Dehghani, Mostafa, Djolonga, Josip, Mustafa, Basil, Padlewski, Piotr, Heek, Jonathan, Gilmer, Justin, Steiner, Andreas, Caron, Mathilde, Geirhos, Robert, Alabdulmohsin, Ibrahim, Jenatton, Rodolphe, Beyer, Lucas, Tschannen, Michael, Arnab, Anurag, Wang, Xiao, Riquelme, Carlos, Minderer, Matthias, Puigcerver, Joan, Evci, Utku, Kumar, Manoj, van Steenkiste, Sjoerd, Elsayed, Gamaleldin F., Mahendran, Aravindh, Yu, Fisher, Oliver, Avital, Huot, Fantine, Bastings, Jasmijn, Collier, Mark Patrick, Gritsenko, Alexey, Birodkar, Vighnesh, Vasconcelos, Cristina, Tay, Yi, Mensink, Thomas, Kolesnikov, Alexander, Pavetić, Filip, Tran, Dustin, Kipf, Thomas, Lučić, Mario, Zhai, Xiaohua, Keysers, Daniel, Harmsen, Jeremiah, Houlsby, Neil
The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image an
Externí odkaz:
http://arxiv.org/abs/2302.05442
Autor:
Ho, Jonathan, Chan, William, Saharia, Chitwan, Whang, Jay, Gao, Ruiqi, Gritsenko, Alexey, Kingma, Diederik P., Poole, Ben, Norouzi, Mohammad, Fleet, David J., Salimans, Tim
We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spa
Externí odkaz:
http://arxiv.org/abs/2210.02303
Autor:
Arnab, Anurag, Xiong, Xuehan, Gritsenko, Alexey, Romijnders, Rob, Djolonga, Josip, Dehghani, Mostafa, Sun, Chen, Lučić, Mario, Schmid, Cordelia
Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large ``upstream'' datasets for classification, as such labels are easy to collect, and then finetuned on ``downstre
Externí odkaz:
http://arxiv.org/abs/2207.03807
Autor:
Minderer, Matthias, Gritsenko, Alexey, Stone, Austin, Neumann, Maxim, Weissenborn, Dirk, Dosovitskiy, Alexey, Mahendran, Aravindh, Arnab, Anurag, Dehghani, Mostafa, Shen, Zhuoran, Wang, Xiao, Zhai, Xiaohua, Kipf, Thomas, Houlsby, Neil
Combining simple architectures with large-scale pre-training has led to massive improvements in image classification. For object detection, pre-training and scaling approaches are less well established, especially in the long-tailed and open-vocabula
Externí odkaz:
http://arxiv.org/abs/2205.06230