Zobrazeno 1 - 10
of 45
pro vyhledávání: '"Ji, Zhilong"'
Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer a
Externí odkaz:
http://arxiv.org/abs/2405.19732
The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning da
Externí odkaz:
http://arxiv.org/abs/2405.07551
Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been ac
Externí odkaz:
http://arxiv.org/abs/2405.05806
Predicting the trajectories of road agents is essential for autonomous driving systems. The recent mainstream methods follow a static paradigm, which predicts the future trajectory by using a fixed duration of historical frames. These methods make th
Externí odkaz:
http://arxiv.org/abs/2404.06351
Parameter-efficient fine-tuning (PEFT) methods have provided an effective way for adapting large vision-language models to specific tasks or scenarios. Typically, they learn a very small scale of parameters for pre-trained models in a white-box formu
Externí odkaz:
http://arxiv.org/abs/2312.15901
Customized text-to-image generation, which aims to learn user-specified concepts with a few images, has drawn significant attention recently. However, existing methods usually suffer from overfitting issues and entangle the subject-unrelated informat
Externí odkaz:
http://arxiv.org/abs/2312.11826
Handwritten mathematical expression recognition (HMER) has attracted extensive attention recently. However, current methods cannot explicitly study the interactions between different symbols, which may fail when faced similar symbols. To alleviate th
Externí odkaz:
http://arxiv.org/abs/2308.10493
Vision Transformers have achieved great success in computer visions, delivering exceptional performance across various tasks. However, their inherent reliance on sequential input enforces the manual partitioning of images into patch sequences, which
Externí odkaz:
http://arxiv.org/abs/2308.10729
Autor:
Su, Yuchen, Chen, Zhineng, Shao, Zhiwen, Du, Yuning, Ji, Zhilong, Bai, Jinfeng, Zhou, Yong, Jiang, Yu-Gang
Recently, regression-based methods, which predict parameterized text shapes for text localization, have gained popularity in scene text detection. However, the existing parameterized text shape methods still have limitations in modeling arbitrary-sha
Externí odkaz:
http://arxiv.org/abs/2306.15142
Despite the progress in semantic image synthesis, it remains a challenging problem to generate photo-realistic parts from input semantic map. Integrating part segmentation map can undoubtedly benefit image synthesis, but is bothersome and inconvenien
Externí odkaz:
http://arxiv.org/abs/2305.19547