Zobrazeno 1 - 10
of 656
pro vyhledávání: '"YOSHIE, OSAMU"'
Autor:
Mao, Weixin, Zhong, Weiheng, Jiang, Zhou, Fang, Dong, Zhang, Zhongyue, Lan, Zihan, Jia, Fan, Wang, Tiancai, Fan, Haoqiang, Yoshie, Osamu
Existing policy learning methods predominantly adopt the task-centric paradigm, necessitating the collection of task data in an end-to-end manner. Consequently, the learned policy tends to fail to tackle novel tasks. Moreover, it is hard to localize
Externí odkaz:
http://arxiv.org/abs/2412.00171
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Autor:
Liu, Yilun, He, Minggui, Yao, Feiyu, Ji, Yuhe, Tao, Shimin, Du, Jingzhou, Li, Duan, Gao, Jian, Zhang, Li, Yang, Hao, Chen, Boxing, Yoshie, Osamu
The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, po
Externí odkaz:
http://arxiv.org/abs/2408.12910
Autor:
Liu, Jihao, Huang, Xin, Zheng, Jinliang, Liu, Boxiao, Wang, Jia, Yoshie, Osamu, Liu, Yu, Li, Hongsheng
This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets ofte
Externí odkaz:
http://arxiv.org/abs/2406.19736
Trading range breakout (TRB) is a key method in the technical analysis of financial trading, widely employed by traders in financial markets such as stocks, futures, and foreign exchange. However, distinguishing between true and false breakout and pr
Externí odkaz:
http://arxiv.org/abs/2402.07536
Autor:
Mao, Weixin, Yang, Jinrong, Ge, Zheng, Song, Lin, Zhou, Hongyu, Mao, Tiezheng, Li, Zeming, Yoshie, Osamu
Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for impr
Externí odkaz:
http://arxiv.org/abs/2306.17450
Many self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we
Externí odkaz:
http://arxiv.org/abs/2301.07088
Autor:
Cui, Quan, Zhao, Bingchen, Chen, Zhao-Min, Zhao, Borui, Song, Renjie, Liang, Jiajun, Zhou, Boyan, Yoshie, Osamu
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a trade-off between t
Externí odkaz:
http://arxiv.org/abs/2203.03871
Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) have revealed the potential of aligning multi-modal representations with contrastive learning. However, these works require a tremendous amount of data and computational resources (e.g
Externí odkaz:
http://arxiv.org/abs/2112.09331
Autor:
Zhang, Jian, Yoshie, Osamu
Publikováno v:
In Neurocomputing 28 August 2024 595
Autor:
Huang, Xin, Wang, Xinxin, Lv, Wenyu, Bai, Xiaying, Long, Xiang, Deng, Kaipeng, Dang, Qingqing, Han, Shumin, Liu, Qiwen, Hu, Xiaoguang, Yu, Dianhai, Ma, Yanjun, Yoshie, Osamu
Being effective and efficient is essential to an object detector for practical use. To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time un
Externí odkaz:
http://arxiv.org/abs/2104.10419