Zobrazeno 1 - 10
of 204
pro vyhledávání: '"Xu, Felix"'
Autor:
Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide
Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made with
Externí odkaz:
http://arxiv.org/abs/2412.10360
Autor:
Wang, Hongjie, Ma, Chih-Yao, Liu, Yen-Cheng, Hou, Ji, Xu, Tao, Wang, Jialiang, Juefei-Xu, Felix, Luo, Yaqiao, Zhang, Peizhao, Hou, Tingbo, Vajda, Peter, Jha, Niraj K., Dai, Xiaoliang
Text-to-video generation enhances content creation but is highly computationally intensive: The computational cost of Diffusion Transformers (DiTs) scales quadratically in the number of pixels. This makes minute-length video generation extremely expe
Externí odkaz:
http://arxiv.org/abs/2412.09856
Autor:
Lai, Bolin, Juefei-Xu, Felix, Liu, Miao, Dai, Xiaoliang, Mehta, Nikhil, Zhu, Chenguang, Huang, Zeyi, Rehg, James M., Lee, Sangmin, Zhang, Ning, Xiao, Tong
Text-guided image manipulation has experienced notable advancement in recent years. In order to mitigate linguistic ambiguity, few-shot learning with visual examples has been applied for instructions that are underrepresented in the training set, or
Externí odkaz:
http://arxiv.org/abs/2412.01027
Autor:
Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng
Prevailing Multimodal Large Language Models (MLLMs) encode the input image(s) as vision tokens and feed them into the language backbone, similar to how Large Language Models (LLMs) process the text tokens. However, the number of vision tokens increas
Externí odkaz:
http://arxiv.org/abs/2412.00556
Autor:
Polyak, Adam, Zohar, Amit, Brown, Andrew, Tjandra, Andros, Sinha, Animesh, Lee, Ann, Vyas, Apoorv, Shi, Bowen, Ma, Chih-Yao, Chuang, Ching-Yao, Yan, David, Choudhary, Dhruv, Wang, Dingkang, Sethi, Geet, Pang, Guan, Ma, Haoyu, Misra, Ishan, Hou, Ji, Wang, Jialiang, Jagadeesh, Kiran, Li, Kunpeng, Zhang, Luxin, Singh, Mannat, Williamson, Mary, Le, Matt, Yu, Matthew, Singh, Mitesh Kumar, Zhang, Peizhao, Vajda, Peter, Duval, Quentin, Girdhar, Rohit, Sumbaly, Roshan, Rambhatla, Sai Saketh, Tsai, Sam, Azadi, Samaneh, Datta, Samyak, Chen, Sanyuan, Bell, Sean, Ramaswamy, Sharadh, Sheynin, Shelly, Bhattacharya, Siddharth, Motwani, Simran, Xu, Tao, Li, Tianhe, Hou, Tingbo, Hsu, Wei-Ning, Yin, Xi, Dai, Xiaoliang, Taigman, Yaniv, Luo, Yaqiao, Liu, Yen-Cheng, Wu, Yi-Chiao, Zhao, Yue, Kirstain, Yuval, He, Zecheng, He, Zijian, Pumarola, Albert, Thabet, Ali, Sanakoyeu, Artsiom, Mallya, Arun, Guo, Baishan, Araya, Boris, Kerr, Breena, Wood, Carleigh, Liu, Ce, Peng, Cen, Vengertsev, Dimitry, Schonfeld, Edgar, Blanchard, Elliot, Juefei-Xu, Felix, Nord, Fraylie, Liang, Jeff, Hoffman, John, Kohler, Jonas, Fire, Kaolin, Sivakumar, Karthik, Chen, Lawrence, Yu, Licheng, Gao, Luya, Georgopoulos, Markos, Moritz, Rashel, Sampson, Sara K., Li, Shikai, Parmeggiani, Simone, Fine, Steve, Fowler, Tara, Petrovic, Vladan, Du, Yuming
We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of
Externí odkaz:
http://arxiv.org/abs/2410.13720
Autor:
He, Xiaoxiao, Han, Ligong, Dao, Quan, Wen, Song, Bai, Minhao, Liu, Di, Zhang, Han, Min, Martin Renqiang, Juefei-Xu, Felix, Tan, Chaowei, Liu, Bo, Li, Kang, Li, Hongdong, Huang, Junzhou, Ahmed, Faez, Srivastava, Akash, Metaxas, Dimitris
Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to ena
Externí odkaz:
http://arxiv.org/abs/2410.08207
Autor:
Zhang, Christina, Motwani, Simran, Yu, Matthew, Hou, Ji, Juefei-Xu, Felix, Tsai, Sam, Vajda, Peter, He, Zijian, Wang, Jialiang
Latent diffusion models (LDMs) have made significant advancements in the field of image generation in recent years. One major advantage of LDMs is their ability to operate in a compressed latent space, allowing for more efficient training and deploym
Externí odkaz:
http://arxiv.org/abs/2409.17565
Autor:
He, Zecheng, Sun, Bo, Juefei-Xu, Felix, Ma, Haoyu, Ramchandani, Ankit, Cheung, Vincent, Shah, Siddharth, Kalia, Anmol, Subramanyam, Harihar, Zareian, Alireza, Chen, Li, Jain, Ankit, Zhang, Ning, Zhang, Peizhao, Sumbaly, Roshan, Vajda, Peter, Sinha, Animesh
Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based persona
Externí odkaz:
http://arxiv.org/abs/2409.13346
Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e.g., truthfulness and toxicity. Recent research has st
Externí odkaz:
http://arxiv.org/abs/2408.10474
Performance evaluation plays a crucial role in the development life cycle of large language models (LLMs). It estimates the model's capability, elucidates behavior characteristics, and facilitates the identification of potential issues and limitation
Externí odkaz:
http://arxiv.org/abs/2408.03573