Zobrazeno 1 - 10
of 272
pro vyhledávání: '"Perez, Patrick"'
Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in some uncommon conditions. In this paper, we present a new framework for domain
Externí odkaz:
http://arxiv.org/abs/2410.21361
We consider the problem of adapting a contrastively pretrained vision-language model like CLIP (Radford et al., 2021) for few-shot classification. The existing literature addresses this problem by learning a linear classifier of the frozen visual fea
Externí odkaz:
http://arxiv.org/abs/2410.05270
Autor:
Défossez, Alexandre, Mazaré, Laurent, Orsini, Manu, Royer, Amélie, Pérez, Patrick, Jégou, Hervé, Grave, Edouard, Zeghidour, Neil
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and t
Externí odkaz:
http://arxiv.org/abs/2410.00037
Autor:
Letzelter, Victor, Perera, David, Rommel, Cédric, Fontaine, Mathieu, Essid, Slim, Richard, Gael, Pérez, Patrick
Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing
Externí odkaz:
http://arxiv.org/abs/2406.04706
Autor:
Sirko-Galouchenko, Sophia, Boulch, Alexandre, Gidaris, Spyros, Bursuc, Andrei, Vobecky, Antonin, Pérez, Patrick, Marlet, Renaud
We introduce a self-supervised pretraining method, called OccFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction prov
Externí odkaz:
http://arxiv.org/abs/2404.14027
Autor:
Messaoud, Kaouther, Grosse, Kathrin, Chen, Mickael, Cord, Matthieu, Pérez, Patrick, Alahi, Alexandre
Autonomous vehicles ought to predict the surrounding agents' trajectories to allow safe maneuvers in uncertain and complex traffic situations. As companies increasingly apply trajectory prediction in the real world, security becomes a relevant concer
Externí odkaz:
http://arxiv.org/abs/2312.13863
Autor:
Wysoczańska, Monika, Siméoni, Oriane, Ramamonjisoa, Michaël, Bursuc, Andrei, Trzciński, Tomasz, Pérez, Patrick
The popular CLIP model displays impressive zero-shot capabilities thanks to its seamless interaction with arbitrary text prompts. However, its lack of spatial awareness makes it unsuitable for dense computer vision tasks, e.g., semantic segmentation,
Externí odkaz:
http://arxiv.org/abs/2312.12359
Assessing the robustness of perception models to covariate shifts and their ability to detect out-of-distribution (OOD) inputs is crucial for safety-critical applications such as autonomous vehicles. By nature of such applications, however, the relev
Externí odkaz:
http://arxiv.org/abs/2312.09231
Learning without supervision how to predict 3D scene flows from point clouds is essential to many perception systems. We propose a novel learning framework for this task which improves the necessary regularization. Relying on the assumption that scen
Externí odkaz:
http://arxiv.org/abs/2312.08879
Autor:
Rommel, Cédric, Letzelter, Victor, Samet, Nermin, Marlet, Renaud, Cord, Matthieu, Pérez, Patrick, Valle, Eduardo
Monocular 3D human pose estimation (3D-HPE) is an inherently ambiguous task, as a 2D pose in an image might originate from different possible 3D poses. Yet, most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inpu
Externí odkaz:
http://arxiv.org/abs/2312.06386