Výsledky vyhledávání - "Codella Noel"

Report

Generative Enhancement for 3D Medical Images

Autor: Zhu, Lingting, Codella, Noel, Chen, Dongdong, Jin, Zhenchao, Yuan, Lu, Yu, Lequan

The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. While a promising alternative is the use of synthesized medical data

Externí odkaz: http://arxiv.org/abs/2403.12852

Zobrazit plný text záznamu

Report

RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision

Autor: Pérez-García, Fernando, Sharma, Harshita, Bond-Taylor, Sam, Bouzid, Kenza, Salvatelli, Valentina, Ilse, Maximilian, Bannur, Shruthi, Castro, Daniel C., Schwaighofer, Anton, Lungren, Matthew P., Wetscherek, Maria, Codella, Noel, Hyland, Stephanie L., Alvarez-Valle, Javier, Oktay, Ozan

Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However,

Externí odkaz: http://arxiv.org/abs/2401.10815

Zobrazit plný text záznamu

Report

Fully Authentic Visual Question Answering Dataset from Online Communities

Autor: Chen, Chongyan, Liu, Mengchen, Codella, Noel, Li, Yunsheng, Yuan, Lu, Gurari, Danna

Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We

Externí odkaz: http://arxiv.org/abs/2311.15562

Zobrazit plný text záznamu

Report

3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology

Autor: Abacha, Asma Ben, Santamaria-Pang, Alberto, Lee, Ho Hin, Merkow, Jameson, Cai, Qin, Devarakonda, Surya Teja, Islam, Abdullah, Gong, Julia, Lungren, Matthew P., Lin, Thomas, Codella, Noel C, Tarapov, Ivan

The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retriev

Externí odkaz: http://arxiv.org/abs/2311.13752

Zobrazit plný text záznamu

Report

MAIRA-1: A specialised large multimodal model for radiology report generation

Autor: Hyland, Stephanie L., Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Ranjit, Mercy, Schwaighofer, Anton, Pérez-García, Fernando, Salvatelli, Valentina, Srivastav, Shaury, Thieme, Anja, Codella, Noel, Lungren, Matthew P., Wetscherek, Maria Teodora, Oktay, Ozan, Alvarez-Valle, Javier

We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with p

Externí odkaz: http://arxiv.org/abs/2311.13668

Zobrazit plný text záznamu

Report

Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond

Autor: Wang, Zhecan, Chen, Long, You, Haoxuan, Xu, Keyang, He, Yicheng, Li, Wenhao, Codella, Noel, Chang, Kai-Wei, Chang, Shih-Fu

Publikováno v: EMNLP 2023

Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correc

Externí odkaz: http://arxiv.org/abs/2310.14670

Zobrazit plný text záznamu

Report

UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding

Autor: Sun, Rui, Wang, Zhecan, You, Haoxuan, Codella, Noel, Chang, Kai-Wei, Chang, Shih-Fu

Vision-language tasks, such as VQA, SNLI-VE, and VCR are challenging because they require the model's reasoning ability to understand the semantics of the visual world and natural language. Supervised methods working for vision-language tasks have be

Externí odkaz: http://arxiv.org/abs/2307.00862

Zobrazit plný text záznamu

Report

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Autor: Yang, Ziyi, Khademi, Mahmoud, Xu, Yichong, Pryzant, Reid, Fang, Yuwei, Zhu, Chenguang, Chen, Dongdong, Qian, Yao, Gao, Mei, Chen, Yi-Ling, Gmyr, Robert, Kanda, Naoyuki, Codella, Noel, Xiao, Bin, Shi, Yu, Yuan, Lu, Yoshioka, Takuya, Zeng, Michael, Huang, Xuedong

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing thi

Externí odkaz: http://arxiv.org/abs/2305.12311

Zobrazit plný text záznamu

Report

Streaming Video Model

Autor: Zhao, Yucheng, Luo, Chong, Tang, Chuanxin, Chen, Dongdong, Codella, Noel, Zha, Zheng-Jun

Video understanding tasks have traditionally been modeled by two separate architectures, specially tailored for two distinct tasks. Sequence-based video tasks, such as action recognition, use a video backbone to directly extract spatiotemporal featur

Externí odkaz: http://arxiv.org/abs/2303.17228

Zobrazit plný text záznamu

Report

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Autor: You, Haoxuan, Zhou, Luowei, Xiao, Bin, Codella, Noel, Cheng, Yu, Xu, Ruochen, Chang, Shih-Fu, Yuan, Lu

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space. Typically, this has employed separate encoder

Externí odkaz: http://arxiv.org/abs/2207.12661

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání