Výsledky vyhledávání - "Veit, Andreas"

Report

Efficient Document Ranking with Learnable Late Interactions

Autor: Ji, Ziwei, Jain, Himanshu, Veit, Andreas, Reddi, Sashank J., Jayasumana, Sadeep, Rawat, Ankit Singh, Menon, Aditya Krishna, Yu, Felix, Kumar, Sanjiv

Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and d

Externí odkaz: http://arxiv.org/abs/2406.17968

Zobrazit plný text záznamu

Report

Rethinking FID: Towards a Better Evaluation Metric for Image Generation

Autor: Jayasumana, Sadeep, Ramalingam, Srikumar, Veit, Andreas, Glasner, Daniel, Chakrabarti, Ayan, Kumar, Sanjiv

As with many machine learning problems, the progress of image generation methods hinges on good evaluation metrics. One of the most popular is the Frechet Inception Distance (FID). FID estimates the distance between a distribution of Inception-v3 fea

Externí odkaz: http://arxiv.org/abs/2401.09603

Zobrazit plný text záznamu

Report

MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

Autor: Jayasumana, Sadeep, Glasner, Daniel, Ramalingam, Srikumar, Veit, Andreas, Chakrabarti, Ayan, Kumar, Sanjiv

Modern text-to-image generation models produce high-quality images that are both photorealistic and faithful to the text prompts. However, this quality comes at significant computational cost: nearly all of these models are iterative and require runn

Externí odkaz: http://arxiv.org/abs/2308.10997

Zobrazit plný text záznamu

Report

Large Language Models with Controllable Working Memory

Autor: Li, Daliang, Rawat, Ankit Singh, Zaheer, Manzil, Wang, Xin, Lukasik, Michal, Veit, Andreas, Yu, Felix, Kumar, Sanjiv

Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world

Externí odkaz: http://arxiv.org/abs/2211.05110

Zobrazit plný text záznamu

Report

When does mixup promote local linearity in learned representations?

Autor: Chaudhry, Arslan, Menon, Aditya Krishna, Veit, Andreas, Jayasumana, Sadeep, Ramalingam, Srikumar, Kumar, Sanjiv

Publikováno v: NeurIPS 2022 (First Workshop on Interpolation and Beyond)

Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learn

Externí odkaz: http://arxiv.org/abs/2210.16413

Zobrazit plný text záznamu

Report

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

Autor: Zaheer, Manzil, Rawat, Ankit Singh, Kim, Seungyeon, You, Chong, Jain, Himanshu, Veit, Andreas, Fergus, Rob, Kumar, Sanjiv

The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also nec

Externí odkaz: http://arxiv.org/abs/2208.06825

Zobrazit plný text záznamu

Report

Leveraging redundancy in attention with Reuse Transformers

Autor: Bhojanapalli, Srinadh, Chakrabarti, Ayan, Veit, Andreas, Lukasik, Michal, Jain, Himanshu, Liu, Frederick, Chang, Yin-Wen, Kumar, Sanjiv

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes s

Externí odkaz: http://arxiv.org/abs/2110.06821

Zobrazit plný text záznamu

Report

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

Autor: Bhojanapalli, Srinadh, Chakrabarti, Ayan, Jain, Himanshu, Kumar, Sanjiv, Lukasik, Michal, Veit, Andreas

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this

Externí odkaz: http://arxiv.org/abs/2106.08823

Zobrazit plný text záznamu

Report

Understanding Robustness of Transformers for Image Classification

Autor: Bhojanapalli, Srinadh, Chakrabarti, Ayan, Glasner, Daniel, Li, Daliang, Unterthiner, Thomas, Veit, Andreas

Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classification. Ho

Externí odkaz: http://arxiv.org/abs/2103.14586

Zobrazit plný text záznamu

Report

On the Reproducibility of Neural Network Predictions

Autor: Bhojanapalli, Srinadh, Wilber, Kimberly, Veit, Andreas, Rawat, Ankit Singh, Kim, Seungyeon, Menon, Aditya, Kumar, Sanjiv

Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such random

Externí odkaz: http://arxiv.org/abs/2102.03349

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání