Výsledky vyhledávání - "Zhang., Ning"

Report

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Autor: Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide

Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made with

Externí odkaz: http://arxiv.org/abs/2412.10360

Zobrazit plný text záznamu

Report

Enhancing low-temperature quantum thermometry via sequential measurements

Autor: Zhang, Ning, Chen, Chong, Wang, Ping

We propose a sequential measurement protocol for accurate low-temperature estimation. The resulting correlated outputs significantly enhance the low temperature precision compared to that of the independent measurement scheme. This enhancement manife

Externí odkaz: http://arxiv.org/abs/2412.04878

Zobrazit plný text záznamu

Report

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation

Autor: Lai, Bolin, Juefei-Xu, Felix, Liu, Miao, Dai, Xiaoliang, Mehta, Nikhil, Zhu, Chenguang, Huang, Zeyi, Rehg, James M., Lee, Sangmin, Zhang, Ning, Xiao, Tong

Text-guided image manipulation has experienced notable advancement in recent years. In order to mitigate linguistic ambiguity, few-shot learning with visual examples has been applied for instructions that are underrepresented in the training set, or

Externí odkaz: http://arxiv.org/abs/2412.01027

Zobrazit plný text záznamu

Report

Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction

Autor: Zhao, Shiyu, Wang, Zhenting, Juefei-Xu, Felix, Xia, Xide, Liu, Miao, Wang, Xiaofang, Liang, Mingfu, Zhang, Ning, Metaxas, Dimitris N., Yu, Licheng

Prevailing Multimodal Large Language Models (MLLMs) encode the input image(s) as vision tokens and feed them into the language backbone, similar to how Large Language Models (LLMs) process the text tokens. However, the number of vision tokens increas

Externí odkaz: http://arxiv.org/abs/2412.00556

Zobrazit plný text záznamu

Report

First-in-human spinal cord tumor imaging with fast adaptive focus tracking robotic-OCT

Autor: He, Bin, Ying, Yuzhe, Shi, Yejiong, Meng, Zhe, Yin, Zichen, Chen, Zhengyu, Hu, Zhangwei, Xue, Ruizhi, Jing, Linkai, Lu, Yang, Sun, Zhenxing, Man, Weitao, Wu, Youtu, Lei, Dan, Zhang, Ning, Wang, Guihuai, Xue, Ping

Current surgical procedures for spinal cord tumors lack in vivo high-resolution, high-speed multifunctional imaging systems, posing challenges for precise tumor resection and intraoperative decision-making. This study introduces the Fast Adaptive Foc

Externí odkaz: http://arxiv.org/abs/2410.21809

Zobrazit plný text záznamu

Report

IBGP: Imperfect Byzantine Generals Problem for Zero-Shot Robustness in Communicative Multi-Agent Systems

Autor: Mao, Yihuan, Kang, Yipeng, Li, Peilun, Zhang, Ning, Xu, Wei, Zhang, Chongjie

As large language model (LLM) agents increasingly integrate into our infrastructure, their robust coordination and message synchronization become vital. The Byzantine Generals Problem (BGP) is a critical model for constructing resilient multi-agent s

Externí odkaz: http://arxiv.org/abs/2410.16237

Zobrazit plný text záznamu

Report

Sequential LLM Framework for Fashion Recommendation

Autor: Liu, Han, Tang, Xianfeng, Chen, Tianlang, Liu, Jiapeng, Indu, Indu, Zou, Henry Peng, Dai, Peng, Galan, Roberto Fernandez, Porter, Michael D, Jia, Dongmei, Zhang, Ning, Xiong, Lian

The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely stud

Externí odkaz: http://arxiv.org/abs/2410.11327

Zobrazit plný text záznamu

Report

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Autor: Guo, Sha, Chen, Zhuo, Zhao, Yang, Zhang, Ning, Li, Xiaotong, Duan, Lingyu

Publikováno v: in Proceedings of the 31st ACM International Conference on Multimedia, pp. 1431-1442, 2023

Traditional image codecs emphasize signal fidelity and human perception, often at the expense of machine vision tasks. Deep learning methods have demonstrated promising coding performance by utilizing rich semantic embeddings optimized for both human

Externí odkaz: http://arxiv.org/abs/2410.06149

Zobrazit plný text záznamu

Report

ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue

Autor: Li, Zhangpu, Zou, Changhong, Ma, Suxue, Yang, Zhicheng, Du, Chen, Tang, Youbao, Cao, Zhenjie, Zhang, Ning, Lai, Jui-Hsin, Lin, Ruei-Sung, Ni, Yuan, Sun, Xingzhi, Xiao, Jing, Hou, Jieke, Zhang, Kai, Han, Mei

The rocketing prosperity of large language models (LLMs) in recent years has boosted the prevalence of vision-language models (VLMs) in the medical sector. In our online medical consultation scenario, a doctor responds to the texts and images provide

Externí odkaz: http://arxiv.org/abs/2409.17610

Zobrazit plný text záznamu

Report

Imagine yourself: Tuning-Free Personalized Image Generation

Autor: He, Zecheng, Sun, Bo, Juefei-Xu, Felix, Ma, Haoyu, Ramchandani, Ankit, Cheung, Vincent, Shah, Siddharth, Kalia, Anmol, Subramanyam, Harihar, Zareian, Alireza, Chen, Li, Jain, Ankit, Zhang, Ning, Zhang, Peizhao, Sumbaly, Roshan, Vajda, Peter, Sinha, Animesh

Diffusion models have demonstrated remarkable efficacy across various image-to-image tasks. In this research, we introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. Unlike conventional tuning-based persona

Externí odkaz: http://arxiv.org/abs/2409.13346

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání