Výsledky vyhledávání

Report

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Autor: Ge, Zhiqi, Li, Juncheng, Pang, Xinglei, Gao, Minghe, Pan, Kaihang, Lin, Wang, Fei, Hao, Zhang, Wenqiao, Tang, Siliang, Zhuang, Yueting

Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems. While text-based agents built on Large Language Models (LLMs) often require frequent updat

Externí odkaz: http://arxiv.org/abs/2412.10342

Zobrazit plný text záznamu

Report

Bridging the Gap for Test-Time Multimodal Sentiment Analysis

Autor: Guo, Zirun, Jin, Tao, Xu, Wenlong, Lin, Wang, Wu, Yangyang

Multimodal sentiment analysis (MSA) is an emerging research topic that aims to understand and recognize human sentiment or emotions through multiple modalities. However, in real-world dynamic scenarios, the distribution of target data is always chang

Externí odkaz: http://arxiv.org/abs/2412.07121

Zobrazit plný text záznamu

Report

ALKPU: an active learning method for the DeePMD model with Kalman filter

Autor: Li, Haibo, Wu, Xingxing, Liu, Liping, Wang, Lin-Wang, Wang, Long, Tan, Guangming, Jia, Weile

Neural network force field models such as DeePMD have enabled highly efficient large-scale molecular dynamics simulations with ab initio accuracy. However, building such models heavily depends on the training data obtained by costly electronic struct

Externí odkaz: http://arxiv.org/abs/2411.13850

Zobrazit plný text záznamu

Report

AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding

Autor: Huang, Zihan, Wu, Tao, Lin, Wang, Zhang, Shengyu, Chen, Jingyuan, Wu, Fei

With the rapid advancement of large language models, there has been a growing interest in their capabilities in mathematical reasoning. However, existing research has primarily focused on text-based algebra problems, neglecting the study of geometry

Externí odkaz: http://arxiv.org/abs/2409.09039

Zobrazit plný text záznamu

Report

Semantic Alignment for Multimodal Large Language Models

Autor: Wu, Tao, Li, Mengze, Chen, Jingyuan, Ji, Wei, Lin, Wang, Gao, Jinyang, Kuang, Kun, Zhao, Zhou, Wu, Fei

Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change caption

Externí odkaz: http://arxiv.org/abs/2408.12867

Zobrazit plný text záznamu

Report

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

Autor: Wang, Dongsheng, Cui, Jiequan, Li, Miaoge, Lin, Wang, Chen, Bo, Zhang, Hanwang

As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality in

Externí odkaz: http://arxiv.org/abs/2408.05019

Zobrazit plný text záznamu

Report

FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow

Autor: Li, Hangyu, Chu, Xiangxiang, Shi, Dingyuan, Lin, Wang

Recent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural RaRecent advances in

Externí odkaz: http://arxiv.org/abs/2408.05008

Zobrazit plný text záznamu

Report

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

Autor: Wang, Ye, Xun, Jiahao, Hong, Minjie, Zhu, Jieming, Jin, Tao, Lin, Wang, Li, Haoyuan, Li, Linjun, Xia, Yan, Zhao, Zhou, Dong, Zhenhua

Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either b

Externí odkaz: http://arxiv.org/abs/2406.14017

Zobrazit plný text záznamu

Report

Realtime observation of a tungsten-promoted size regulation mechanism in a rhodium catalyst at atomic resolution

Autor: Specht, Petra, Kang, Joo H., Tarafder, Kartick, Cieslinski, Robert, Barton, David, Barton, Bastian, Carlsson, Anna, Wang, Lin-Wang, Kisielowski, Christian

The static and genuine structure of small rhodium and rhodium/tungsten nanoparticles on an alumina support can be imaged with atomic resolution even if single digit atom clusters are investigated. Low dose rate electron microscopy is key to the achie

Externí odkaz: http://arxiv.org/abs/2406.05689

Zobrazit plný text záznamu

Report

Non-confusing Generation of Customized Concepts in Diffusion Models

Autor: Lin, Wang, Chen, Jingyuan, Shi, Jiaxin, Zhu, Yichen, Liang, Chen, Miao, Junzhong, Jin, Tao, Zhao, Zhou, Wu, Fei, Yan, Shuicheng, Zhang, Hanwang

We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs). It becomes even more pronounced in the generation of customized concepts, due to the scarcity of user-pro

Externí odkaz: http://arxiv.org/abs/2405.06914

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání