Zobrazeno 1 - 10
of 705
pro vyhledávání: '"Dai Guang"'
Autor:
Lin, Haonan, Wang, Mengmeng, Wang, Jiahao, An, Wenbin, Chen, Yan, Liu, Yong, Tian, Feng, Dai, Guang, Wang, Jingdong, Wang, Qianying
Text-guided diffusion models have significantly advanced image editing, enabling high-quality and diverse modifications driven by text prompts. However, effective editing requires inverting the source image into a latent space, a process often hinder
Externí odkaz:
http://arxiv.org/abs/2410.18756
Evidence-enhanced detectors present remarkable abilities in identifying malicious social text with related evidence. However, the rise of large language models (LLMs) brings potential risks of evidence pollution to confuse detectors. This paper explo
Externí odkaz:
http://arxiv.org/abs/2410.12600
Autor:
Lin, Haonan, An, Wenbin, Wang, Jiahao, Chen, Yan, Tian, Feng, Wang, Mengmeng, Dai, Guang, Wang, Qianying, Wang, Jingdong
Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD). Typically, this involves a teacher-student framework in which the teacher imparts knowledge to the
Externí odkaz:
http://arxiv.org/abs/2409.19659
Autor:
Wang, Jiahao, Yan, Caixia, Zhang, Weizhan, Lin, Haonan, Wang, Mengmeng, Dai, Guang, Gong, Tieliang, Sun, Hao, Wang, Jingdong
Text-to-image diffusion models significantly enhance the efficiency of artistic creation with high-fidelity image generation. However, in typical application scenarios like comic book production, they can neither place each subject into its expected
Externí odkaz:
http://arxiv.org/abs/2409.04801
Information spreads faster through social media platforms than traditional media, thus becoming an ideal medium to spread misinformation. Meanwhile, automated accounts, known as social bots, contribute more to the misinformation dissemination. In thi
Externí odkaz:
http://arxiv.org/abs/2408.09613
Autor:
Dang, Zhuohang, Luo, Minnan, Wang, Jihong, Jia, Chengyou, Han, Haochen, Wan, Herun, Dai, Guang, Chang, Xiaojun, Wang, Jingdong
Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e.
Externí odkaz:
http://arxiv.org/abs/2408.05503
Autor:
An, Wenbin, Tian, Feng, Nie, Jiahao, Shi, Wenkai, Lin, Haonan, Chen, Yan, Wang, QianYing, Wu, Yaqiang, Dai, Guang, Chen, Ping
Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answer
Externí odkaz:
http://arxiv.org/abs/2407.15346
Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-
Externí odkaz:
http://arxiv.org/abs/2407.03917
Autor:
An, Wenbin, Tian, Feng, Leng, Sicong, Nie, Jiahao, Lin, Haonan, Wang, QianYing, Dai, Guang, Chen, Ping, Lu, Shijian
Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given
Externí odkaz:
http://arxiv.org/abs/2406.12718
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an
Externí odkaz:
http://arxiv.org/abs/2405.17761