Zobrazeno 1 - 10
of 11 482
pro vyhledávání: '"Koppula"'
Autor:
Carreira, João, Gokay, Dilara, King, Michael, Zhang, Chuhan, Rocco, Ignacio, Mahendran, Aravindh, Keck, Thomas Albert, Heyward, Joseph, Koppula, Skanda, Pot, Etienne, Erdogan, Goker, Hasson, Yana, Yang, Yi, Greff, Klaus, Moing, Guillaume Le, van Steenkiste, Sjoerd, Zoran, Daniel, Hudson, Drew A., Vélez, Pedro, Polanía, Luisa, Friedman, Luke, Duvarney, Chris, Goroshin, Ross, Allen, Kelsey, Walker, Jacob, Kabra, Rishabh, Aboussouan, Eric, Sun, Jennifer, Kipf, Thomas, Doersch, Carl, Pătrăucean, Viorica, Damen, Dima, Luc, Pauline, Sajjadi, Mehdi S. M., Zisserman, Andrew
Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this pape
Externí odkaz:
http://arxiv.org/abs/2412.15212
Autor:
Beyer, Lucas, Steiner, Andreas, Pinto, André Susano, Kolesnikov, Alexander, Wang, Xiao, Salz, Daniel, Neumann, Maxim, Alabdulmohsin, Ibrahim, Tschannen, Michael, Bugliarello, Emanuele, Unterthiner, Thomas, Keysers, Daniel, Koppula, Skanda, Liu, Fangyu, Grycner, Adam, Gritsenko, Alexey, Houlsby, Neil, Kumar, Manoj, Rong, Keran, Eisenschlos, Julian, Kabra, Rishabh, Bauer, Matthias, Bošnjak, Matko, Chen, Xi, Minderer, Matthias, Voigtlaender, Paul, Bica, Ioana, Balazevic, Ivana, Puigcerver, Joan, Papalampidi, Pinelopi, Henaff, Olivier, Xiong, Xi, Soricut, Radu, Harmsen, Jeremiah, Zhai, Xiaohua
PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong
Externí odkaz:
http://arxiv.org/abs/2407.07726
Autor:
Koppula, Skanda, Rocco, Ignacio, Yang, Yi, Heyward, Joe, Carreira, João, Zisserman, Andrew, Brostow, Gabriel, Doersch, Carl
We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three
Externí odkaz:
http://arxiv.org/abs/2407.05921
Autor:
Balažević, Ivana, Shi, Yuge, Papalampidi, Pinelopi, Chaabouni, Rahma, Koppula, Skanda, Hénaff, Olivier J.
Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complex
Externí odkaz:
http://arxiv.org/abs/2402.05861
Autor:
Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Goroshin, Ross, Carreira, João, Zisserman, Andrew
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any
Externí odkaz:
http://arxiv.org/abs/2402.00847
Publikováno v:
49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)
The Polynomial-Time Hierarchy ($\mathsf{PH}$) is a staple of classical complexity theory, with applications spanning randomized computation to circuit lower bounds to ''quantum advantage'' analyses for near-term quantum computers. Quantumly, however,
Externí odkaz:
http://arxiv.org/abs/2401.01633
Autor:
Papalampidi, Pinelopi, Koppula, Skanda, Pathak, Shreya, Chiu, Justin, Heyward, Joe, Patraucean, Viorica, Shen, Jiajun, Miech, Antoine, Zisserman, Andrew, Nematzadeh, Aida
Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow tempora
Externí odkaz:
http://arxiv.org/abs/2312.07395
Autor:
Anitha Mandava, Veeraiah Koppula, Meghana Kandati, Arvind K. Reddy, Senthil J. Rajappa, T. S. Rao
Publikováno v:
Indian Journal of Radiology and Imaging, Vol 35, Iss 01, Pp 148-158 (2025)
Choriocarcinoma is an uncommon, highly invasive malignancy arising from the placental trophoblastic tissue. Though staging is clinical, imaging has significant role in the diagnosis, staging, prognostic risk scoring, and management of choriocarcinoma
Externí odkaz:
https://doaj.org/article/42d9387898774cd1bdcdaf4ec60d11df
Autor:
Mandava, Anitha1 (AUTHOR) kanisri@gmail.com, Koppula, Veeraiah1 (AUTHOR), Kandati, Meghana1 (AUTHOR), Reddy, Arvind K.1 (AUTHOR), Rajappa, Senthil J.2 (AUTHOR), Rao, T. S.3 (AUTHOR)
Publikováno v:
Indian Journal of Radiology & Imaging. Jan2025, Vol. 35 Issue 1, p148-158. 11p.
Autor:
Su, Hsuan, Hu, Ting-Yao, Koppula, Hema Swetha, Vemulapalli, Raviteja, Chang, Jen-Hao Rick, Yang, Karren, Mantena, Gautam Varma, Tuzel, Oncel
While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readil
Externí odkaz:
http://arxiv.org/abs/2309.10707