Zobrazeno 1 - 10
of 828
pro vyhledávání: '"Li Yuheng"'
Autor:
Zhang Wenjuan, Li Yuheng, Ran Mengyu, Wang Youliang, Ding Yezhi, Zhang Bobo, Feng Qiancheng, Chu Qianqian, Shen Yongqian, Sheng Wang
Publikováno v:
Reviews on Advanced Materials Science, Vol 63, Iss 1, Pp id. 116748-41438 (2024)
Fe nanoparticle-functionalized ordered mesoporous carbon (Fe0/OMC) was synthesized using triblock copolymers as templates and through solvent evaporation self-assembly, followed by a carbothermal reduction. Fe0/OMC had three microstructures of two-di
Externí odkaz:
https://doaj.org/article/591e4968d76940a3965e9e18126c35ee
Autor:
Li, Yuheng, Liu, Haotian, Cai, Mu, Li, Yijun, Shechtman, Eli, Lin, Zhe, Lee, Yong Jae, Singh, Krishna Kumar
In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting the challenge of compositional understanding in current visual-language models. Our approach focuses on generating high-quality training dataset
Externí odkaz:
http://arxiv.org/abs/2410.00905
Autor:
Chen, Weitong, Li, Yuheng
In recent years, digital watermarking techniques based on deep learning have been widely studied. To achieve both imperceptibility and robustness of image watermarks, most current methods employ convolutional neural networks to build robust watermark
Externí odkaz:
http://arxiv.org/abs/2409.14829
Autor:
Shang, Yuzhang, Xu, Bingxin, Kang, Weitai, Cai, Mu, Li, Yuheng, Wen, Zehao, Dong, Zhen, Keutzer, Kurt, Lee, Yong Jae, Yan, Yan
Advancements in Large Language Models (LLMs) inspire various strategies for integrating video modalities. A key approach is Video-LLMs, which incorporate an optimizable interface linking sophisticated video encoders to LLMs. However, due to computati
Externí odkaz:
http://arxiv.org/abs/2409.12963
Due to the scarcity of labeled data, self-supervised learning (SSL) has gained much attention in 3D medical image segmentation, by extracting semantic representations from unlabeled data. Among SSL strategies, Masked image modeling (MIM) has shown ef
Externí odkaz:
http://arxiv.org/abs/2407.06468
Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle per
Externí odkaz:
http://arxiv.org/abs/2406.09400
Autor:
Chen, Xuxin, Li, Yuheng, Hu, Mingzhe, Salari, Ella, Chen, Xiaoqian, Qiu, Richard L. J., Zheng, Bin, Yang, Xiaofeng
Although fusion of information from multiple views of mammograms plays an important role to increase accuracy of breast cancer detection, developing multi-view mammograms-based computer-aided diagnosis (CAD) schemes still faces challenges and no such
Externí odkaz:
http://arxiv.org/abs/2404.15946
Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With sim
Externí odkaz:
http://arxiv.org/abs/2310.03744
Text-conditioned image editing has emerged as a powerful tool for editing images. However, in many situations, language can be ambiguous and ineffective in describing specific image edits. When faced with such challenges, visual prompts can be a more
Externí odkaz:
http://arxiv.org/abs/2307.14331
The recent large language models (LLMs), e.g., ChatGPT, have been able to generate human-like and fluent responses when provided with specific instructions. While admitting the convenience brought by technological advancement, educators also have con
Externí odkaz:
http://arxiv.org/abs/2307.12267