Výsledky vyhledávání

Report

TriG-NER: Triplet-Grid Framework for Discontinuous Named Entity Recognition

Autor: Cabral, Rina Carines, Han, Soyeon Caren, Alhassan, Areej, Batista-Navarro, Riza, Nenadic, Goran, Poon, Josiah

Discontinuous Named Entity Recognition (DNER) presents a challenging problem where entities may be scattered across multiple non-adjacent tokens, making traditional sequence labelling approaches inadequate. Existing methods predominantly rely on cust

Externí odkaz: http://arxiv.org/abs/2411.01839

Zobrazit plný text záznamu

Report

Multimodal Commonsense Knowledge Distillation for Visual Question Answering

Autor: Yang, Shuo, Luo, Siwen, Han, Soyeon Caren

Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in the general Visual Question Answering (VQA). However, these models struggle with VQA questions that require external

Externí odkaz: http://arxiv.org/abs/2411.02722

Zobrazit plný text záznamu

Report

ChuLo: Chunk-Level Key Information Representation for Long Document Processing

Autor: Li, Yan, Han, Soyeon Caren, Dai, Yue, Cao, Feiqi

Transformer-based models have achieved remarkable success in various Natural Language Processing (NLP) tasks, yet their ability to handle long documents is constrained by computational limitations. Traditional approaches, such as truncating inputs, s

Externí odkaz: http://arxiv.org/abs/2410.11119

Zobrazit plný text záznamu

Report

Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond

Autor: Han, Soyeon Caren, Cao, Feiqi, Poon, Josiah, Navigli, Roberto

This tutorial explores recent advancements in multimodal pretrained and large models, capable of integrating and processing diverse data forms such as text, images, audio, and video. Participants will gain an understanding of the foundational concept

Externí odkaz: http://arxiv.org/abs/2410.05608

Zobrazit plný text záznamu

Report

PFGuard: A Generative Framework with Privacy and Fairness Safeguards

Autor: Kim, Soyeon, Roh, Yuji, Heo, Geon, Whang, Steven Euijong

Generative models must ensure both privacy and fairness for Trustworthy AI. While these goals have been pursued separately, recent studies propose to combine existing privacy and fairness techniques to achieve both goals. However, naively combining t

Externí odkaz: http://arxiv.org/abs/2410.02246

Zobrazit plný text záznamu

Report

DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights

Autor: Ding, Yihao, Han, Soyeon Caren, Li, Zechuan, Chung, Hyunsuk

Visually-Rich Documents (VRDs), encompassing elements like charts, tables, and references, convey complex information across various fields. However, extracting information from these rich documents is labor-intensive, especially given their inconsis

Externí odkaz: http://arxiv.org/abs/2410.01609

Zobrazit plný text záznamu

Report

MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU

Autor: Li, Yan, Kim, So-Eon, Park, Seong-Bae, Han, Soyeon Caren

Although Large Language Models(LLMs) can generate coherent and contextually relevant text, they often struggle to recognise the intent behind the human user's query. Natural Language Understanding (NLU) models, however, interpret the purpose and key

Externí odkaz: http://arxiv.org/abs/2408.08144

Zobrazit plný text záznamu

Report

MSG-Chart: Multimodal Scene Graph for ChartQA

Autor: Dai, Yue, Han, Soyeon Caren, Liu, Wei

Automatic Chart Question Answering (ChartQA) is challenging due to the complex distribution of chart elements with patterns of the underlying data not explicitly displayed in charts. To address this challenge, we design a joint multimodal scene graph

Externí odkaz: http://arxiv.org/abs/2408.04852

Zobrazit plný text záznamu

Report

EXAONE 3.0 7.8B Instruction Tuned Language Model

We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote ope

Externí odkaz: http://arxiv.org/abs/2408.03541

Zobrazit plný text záznamu

Report

Infusing Environmental Captions for Long-Form Video Language Grounding

Autor: Lee, Hyogun, Hong, Soyeon, Sung, Mujeen, Choi, Jinwoo

In this work, we tackle the problem of long-form video-language grounding (VLG). Given a long-form video and a natural language query, a model should temporally localize the precise moment that answers the query. Humans can easily solve VLG tasks, ev

Externí odkaz: http://arxiv.org/abs/2408.02336

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání