Výsledky vyhledávání - "CHEN Dongdong"

Report

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation

Autor: Wei, Tianyi, Chen, Dongdong, Zhou, Yifan, Pan, Xingang

Representing the cutting-edge technique of text-to-image models, the latest Multimodal Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous models. However, we discover that it still suffers from subject neglect

Externí odkaz: http://arxiv.org/abs/2411.18301

Zobrazit plný text záznamu

Report

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Autor: Huang, Weiquan, Wu, Aoqi, Yang, Yifan, Luo, Xufang, Yang, Yuqing, Hu, Liang, Dai, Qi, Dai, Xiyang, Chen, Dongdong, Luo, Chong, Qiu, Lili

CLIP is a foundational multimodal model that aligns image and text features into a shared space using contrastive learning on large-scale image-text pairs. Its strength lies in leveraging natural language as a rich supervisory signal. With the rapid

Externí odkaz: http://arxiv.org/abs/2411.04997

Zobrazit plný text záznamu

Report

SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing

Autor: Zhang, Zhiyuan, Chen, DongDong, Liao, Jing

Scene graphs offer a structured, hierarchical representation of images, with nodes and edges symbolizing objects and the relationships among them. It can serve as a natural interface for image editing, dramatically improving precision and flexibility

Externí odkaz: http://arxiv.org/abs/2410.11815

Zobrazit plný text záznamu

Report

ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities

Autor: Jin, Zhenchao, Liu, Mengchen, Chen, Dongdong, Zhu, Lingting, Li, Yunsheng, Yu, Lequan

Through the integration of external tools, large language models (LLMs) such as GPT-4o and Llama 3.1 significantly expand their functional capabilities, evolving from elementary conversational agents to general-purpose assistants. We argue that the p

Externí odkaz: http://arxiv.org/abs/2410.10872

Zobrazit plný text záznamu

Report

SynChart: Synthesizing Charts from Language Models

Autor: Liu, Mengchen, Li, Qixiu, Chen, Dongdong, Chen, Dong, Bao, Jianmin, Li, Yunsheng

With the release of GPT-4V(O), its use in generating pseudo labels for multi-modality tasks has gained significant popularity. However, it is still a secret how to build such advanced models from its base large language models (LLMs). This work explo

Externí odkaz: http://arxiv.org/abs/2409.16517

Zobrazit plný text záznamu

Report

Pluralistic Salient Object Detection

Autor: Feng, Xuelu, Li, Yunsheng, Chen, Dongdong, Qiao, Chunming, Yuan, Junsong, Yuan, Lu, Hua, Gang

We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image. Unlike conventional SOD methods that produce a single segmentation mask for salient ob

Externí odkaz: http://arxiv.org/abs/2409.02368

Zobrazit plný text záznamu

Report

Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

Autor: Wang, Can, Zhong, Hongliang, Chai, Menglei, He, Mingming, Chen, Dongdong, Liao, Jing

Automatic furniture layout is long desired for convenient interior design. Leveraging the remarkable visual reasoning capabilities of multimodal large language models (MLLMs), recent methods address layout generation in a static manner, lacking the f

Externí odkaz: http://arxiv.org/abs/2407.21333

Zobrazit plný text záznamu

Report

Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

Autor: Lin, Yuanze, Li, Yunsheng, Chen, Dongdong, Xu, Weijian, Clark, Ronald, Torr, Philip, Yuan, Lu

In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly conveying

Externí odkaz: http://arxiv.org/abs/2407.04681

Zobrazit plný text záznamu

Report

Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

Autor: Yue, Wenzhen, Ying, Xianghua, Guo, Ruohao, Chen, DongDong, Shi, Ji, Xing, Bowei, Zhu, Yuqing, Chen, Taiyan

In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our

Externí odkaz: http://arxiv.org/abs/2404.18948

Zobrazit plný text záznamu

Report

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Autor: Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi

Externí odkaz: http://arxiv.org/abs/2404.14219

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání