Zobrazeno 1 - 10
of 396
pro vyhledávání: '"Kolter, J."'
We introduce a novel, training-free method for sampling differentiable representations (diffreps) using pretrained diffusion models. Rather than merely mode-seeking, our method achieves sampling by "pulling back" the dynamics of the reverse-time proc
Externí odkaz:
http://arxiv.org/abs/2412.06981
Autor:
Dontas, Michail, He, Yutong, Murata, Naoki, Mitsufuji, Yuki, Kolter, J. Zico, Salakhutdinov, Ruslan
Blind inverse problems, where both the target data and forward operator are unknown, are crucial to many computer vision applications. Existing methods often depend on restrictive assumptions such as additional training, operator linearity, or narrow
Externí odkaz:
http://arxiv.org/abs/2412.00557
Vision Language Models (VLMs) have demonstrated strong capabilities across various visual understanding and reasoning tasks. However, their real-world deployment is often constrained by high latency during inference due to substantial compute require
Externí odkaz:
http://arxiv.org/abs/2411.03312
Publikováno v:
NeurIPS 2024
Despite their strong performances on many generative tasks, diffusion models require a large number of sampling steps in order to generate realistic samples. This has motivated the community to develop effective methods to distill pre-trained diffusi
Externí odkaz:
http://arxiv.org/abs/2410.16794
Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while r
Externí odkaz:
http://arxiv.org/abs/2410.14522
The composition of pretraining data is a key determinant of foundation models' performance, but there is no standard guideline for allocating a limited computational budget across different data sources. Most current approaches either rely on extensi
Externí odkaz:
http://arxiv.org/abs/2410.11820
Recent work has shown that state space models such as Mamba are significantly worse than Transformers on recall-based tasks due to the fact that their state size is constant with respect to their input sequence length. But in practice, state space mo
Externí odkaz:
http://arxiv.org/abs/2410.11135
A standard practice when using large language models is for users to supplement their instruction with an input context containing new information for the model to process. However, models struggle to reliably follow the input context, especially whe
Externí odkaz:
http://arxiv.org/abs/2410.10796
Vision-language models (VLMs) such as CLIP are trained via contrastive learning between text and image pairs, resulting in aligned image and text embeddings that are useful for many downstream tasks. A notable drawback of CLIP, however, is that the r
Externí odkaz:
http://arxiv.org/abs/2409.09721
Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown pro
Externí odkaz:
http://arxiv.org/abs/2408.10189