Zobrazeno 1 - 10
of 434
pro vyhledávání: '"Zhang Peiyuan"'
Autor:
Zhang, Peiyuan, Karbasi, Amin
Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size $\eta$
Externí odkaz:
http://arxiv.org/abs/2412.08025
Autor:
Li, Lei, Liu, Yuanxin, Yao, Linli, Zhang, Peiyuan, An, Chenxin, Wang, Lean, Sun, Xu, Kong, Lingpeng, Liu, Qi
Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitation to the
Externí odkaz:
http://arxiv.org/abs/2410.06166
Autor:
Li, Bo, Zhang, Yuanhan, Guo, Dong, Zhang, Renrui, Li, Feng, Zhang, Hao, Zhang, Kaichen, Zhang, Peiyuan, Li, Yanwei, Liu, Ziwei, Li, Chunyuan
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision
Externí odkaz:
http://arxiv.org/abs/2408.03326
Autor:
Zhang, Kaichen, Li, Bo, Zhang, Peiyuan, Pu, Fanyi, Cahyono, Joshua Adrian, Hu, Kairui, Liu, Shuai, Zhang, Yuanhan, Yang, Jingkang, Li, Chunyuan, Liu, Ziwei
The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) r
Externí odkaz:
http://arxiv.org/abs/2407.12772
Autor:
Zhang, Peiyuan, Zhang, Kaichen, Li, Bo, Zeng, Guangtao, Yang, Jingkang, Zhang, Yuanhan, Wang, Ziyue, Tan, Haoran, Li, Chunyuan, Liu, Ziwei
Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively
Externí odkaz:
http://arxiv.org/abs/2406.16852
We study the computational limits of the following general hypothesis testing problem. Let H=H_n be an \emph{arbitrary} undirected graph on n vertices. We study the detection task between a ``null'' Erd\H{o}s-R\'{e}nyi random graph G(n,p) and a ``pla
Externí odkaz:
http://arxiv.org/abs/2403.17766
We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source communit
Externí odkaz:
http://arxiv.org/abs/2401.02385
In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models that are constrained by fixed-size visio
Externí odkaz:
http://arxiv.org/abs/2311.04219
Autor:
Kunisky, Dmitriy, Zhang, Peiyuan
We study the operator norm discrepancy of i.i.d. random matrices, initiating the matrix-valued analog of a long line of work on the $\ell^{\infty}$ norm discrepancy of i.i.d. random vectors. First, using repurposed results on vector discrepancy and n
Externí odkaz:
http://arxiv.org/abs/2307.10055
Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage. To mitigate this, parameter-efficient transfer learning (PETL) methods have been proposed to address this issue, but they still require a significan
Externí odkaz:
http://arxiv.org/abs/2305.17682