Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Veit, Andreas"'
Autor:
Ji, Ziwei, Jain, Himanshu, Veit, Andreas, Reddi, Sashank J., Jayasumana, Sadeep, Rawat, Ankit Singh, Menon, Aditya Krishna, Yu, Felix, Kumar, Sanjiv
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and d
Externí odkaz:
http://arxiv.org/abs/2406.17968
Autor:
Jayasumana, Sadeep, Ramalingam, Srikumar, Veit, Andreas, Glasner, Daniel, Chakrabarti, Ayan, Kumar, Sanjiv
As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 fea
Externí odkaz:
http://arxiv.org/abs/2401.09603
Autor:
Jayasumana, Sadeep, Glasner, Daniel, Ramalingam, Srikumar, Veit, Andreas, Chakrabarti, Ayan, Kumar, Sanjiv
Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts. However, this quality comes at significant computational cost: nearly all of these models are iterative and require runn
Externí odkaz:
http://arxiv.org/abs/2308.10997
Autor:
Li, Daliang, Rawat, Ankit Singh, Zaheer, Manzil, Wang, Xin, Lukasik, Michal, Veit, Andreas, Yu, Felix, Kumar, Sanjiv
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world
Externí odkaz:
http://arxiv.org/abs/2211.05110
Autor:
Chaudhry, Arslan, Menon, Aditya Krishna, Veit, Andreas, Jayasumana, Sadeep, Ramalingam, Srikumar, Kumar, Sanjiv
Publikováno v:
NeurIPS 2022 (First Workshop on Interpolation and Beyond)
Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learn
Externí odkaz:
http://arxiv.org/abs/2210.16413
Autor:
Zaheer, Manzil, Rawat, Ankit Singh, Kim, Seungyeon, You, Chong, Jain, Himanshu, Veit, Andreas, Fergus, Rob, Kumar, Sanjiv
The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also nec
Externí odkaz:
http://arxiv.org/abs/2208.06825
Autor:
Bhojanapalli, Srinadh, Chakrabarti, Ayan, Veit, Andreas, Lukasik, Michal, Jain, Himanshu, Liu, Frederick, Chang, Yin-Wen, Kumar, Sanjiv
Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes s
Externí odkaz:
http://arxiv.org/abs/2110.06821
Autor:
Bhojanapalli, Srinadh, Chakrabarti, Ayan, Jain, Himanshu, Kumar, Sanjiv, Lukasik, Michal, Veit, Andreas
State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this
Externí odkaz:
http://arxiv.org/abs/2106.08823
Autor:
Bhojanapalli, Srinadh, Chakrabarti, Ayan, Glasner, Daniel, Li, Daliang, Unterthiner, Thomas, Veit, Andreas
Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classification. Ho
Externí odkaz:
http://arxiv.org/abs/2103.14586
Autor:
Bhojanapalli, Srinadh, Wilber, Kimberly, Veit, Andreas, Rawat, Ankit Singh, Kim, Seungyeon, Menon, Aditya, Kumar, Sanjiv
Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such random
Externí odkaz:
http://arxiv.org/abs/2102.03349