Výsledky vyhledávání - "Text dataset"

Report

Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms

Autor: Meyer, Jordan, Padgett, Nick, Miller, Cullen, Exline, Laura

We present Public Domain 12M (PD12M), a dataset of 12.4 million high-quality public domain and CC0-licensed images with synthetic captions, designed for training text-to-image models. PD12M is the largest public domain image-text dataset to date, wit

Externí odkaz: http://arxiv.org/abs/2410.23144

Zobrazit plný text záznamu

Report

Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models

Autor: Yi, Hao, Li, Qingyang, Hu, Yulan, Zhang, Fuzheng, Zhang, Di, Liu, Yong

High-quality video-text preference data is crucial for Multimodal Large Language Models (MLLMs) alignment. However, existing preference data is very scarce. Obtaining VQA preference data for preference training is costly, and manually annotating resp

Externí odkaz: http://arxiv.org/abs/2411.16201

Zobrazit plný text záznamu

Report

15M Multimodal Facial Image-Text Dataset

Autor: Dai, Dawei, Li, YuTang, Liu, YingGe, Jia, Mingming, YuanHui, Zhang, Wang, Guoyin

Currently, image-text-driven multi-modal deep learning models have demonstrated their outstanding potential in many fields. In practice, tasks centered around facial images have broad application prospects. This paper presents \textbf{FaceCaption-15M

Externí odkaz: http://arxiv.org/abs/2407.08515

Zobrazit plný text záznamu

Report

GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations

Autor: Wilming, Rick, Dox, Artur, Schulz, Hjalmar, Oliveira, Marta, Clark, Benedict, Haufe, Stefan

Large pre-trained language models have become popular for many applications and form an important backbone of many downstream tasks in natural language processing (NLP). Applying 'explainable artificial intelligence' (XAI) techniques to enrich such m

Externí odkaz: http://arxiv.org/abs/2406.11547

Zobrazit plný text záznamu

Report

TextAge: A Curated and Diverse Text Dataset for Age Classification

Autor: Cheekati, Shravan, Gupta, Mridul, Raghu, Vibha, Raj, Pranav

Age-related language patterns play a crucial role in understanding linguistic differences and developing age-appropriate communication strategies. However, the lack of comprehensive and diverse datasets has hindered the progress of research in this a

Externí odkaz: http://arxiv.org/abs/2406.16890

Zobrazit plný text záznamu

Report

Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset

Autor: Yang, Yuchen, Duan, Yingxuan

A more robust and holistic language-video representation is the key to pushing video understanding forward. Despite the improvement in training strategies, the quality of the language-video dataset is less attention to. The current plain and simple t

Externí odkaz: http://arxiv.org/abs/2406.13809

Zobrazit plný text záznamu

Report

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Autor: Chen, Wei, Li, Lin, Yang, Yongqi, Wen, Bin, Yang, Fan, Gao, Tingting, Wu, Yu, Chen, Long

Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generat

Externí odkaz: http://arxiv.org/abs/2406.10462

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Report

ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models

Autor: Yuan, Zhenghang, Xiong, Zhitong, Mou, Lichao, Zhu, Xiao Xiang

An in-depth comprehension of global land cover is essential in Earth observation, forming the foundation for a multitude of applications. Although remote sensing technology has advanced rapidly, leading to a proliferation of satellite imagery, the in

Externí odkaz: http://arxiv.org/abs/2402.11325

Zobrazit plný text záznamu

Report

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

Autor: Reif, Emily, Qian, Crystal, Wexler, James, Kahng, Minsuk

Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicit

Externí odkaz: http://arxiv.org/abs/2402.14880

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání