Zobrazeno 1 - 10
of 191
pro vyhledávání: '"Codella Noel"'
The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. While a promising alternative is the use of synthesized medical data
Externí odkaz:
http://arxiv.org/abs/2403.12852
Autor:
Pérez-García, Fernando, Sharma, Harshita, Bond-Taylor, Sam, Bouzid, Kenza, Salvatelli, Valentina, Ilse, Maximilian, Bannur, Shruthi, Castro, Daniel C., Schwaighofer, Anton, Lungren, Matthew P., Wetscherek, Maria, Codella, Noel, Hyland, Stephanie L., Alvarez-Valle, Javier, Oktay, Ozan
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images, serving as a foundational element in multimodal systems within the computer vision and medical imaging domains. However,
Externí odkaz:
http://arxiv.org/abs/2401.10815
Visual Question Answering (VQA) entails answering questions about images. We introduce the first VQA dataset in which all contents originate from an authentic use case. Sourced from online question answering community forums, we call it VQAonline. We
Externí odkaz:
http://arxiv.org/abs/2311.15562
Autor:
Abacha, Asma Ben, Santamaria-Pang, Alberto, Lee, Ho Hin, Merkow, Jameson, Cai, Qin, Devarakonda, Surya Teja, Islam, Abdullah, Gong, Julia, Lungren, Matthew P., Lin, Thomas, Codella, Noel C, Tarapov, Ivan
The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retriev
Externí odkaz:
http://arxiv.org/abs/2311.13752
Autor:
Hyland, Stephanie L., Bannur, Shruthi, Bouzid, Kenza, Castro, Daniel C., Ranjit, Mercy, Schwaighofer, Anton, Pérez-García, Fernando, Salvatelli, Valentina, Srivastav, Shaury, Thieme, Anja, Codella, Noel, Lungren, Matthew P., Wetscherek, Maria Teodora, Oktay, Ozan, Alvarez-Valle, Javier
We present a radiology-specific multimodal model for the task for generating radiological reports from chest X-rays (CXRs). Our work builds on the idea that large language model(s) can be equipped with multimodal capabilities through alignment with p
Externí odkaz:
http://arxiv.org/abs/2311.13668
Autor:
Wang, Zhecan, Chen, Long, You, Haoxuan, Xu, Keyang, He, Yicheng, Li, Wenhao, Codella, Noel, Chang, Kai-Wei, Chang, Shih-Fu
Publikováno v:
EMNLP 2023
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions. However, we have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correc
Externí odkaz:
http://arxiv.org/abs/2310.14670
Vision-language tasks, such as VQA, SNLI-VE, and VCR are challenging because they require the model's reasoning ability to understand the semantics of the visual world and natural language. Supervised methods working for vision-language tasks have be
Externí odkaz:
http://arxiv.org/abs/2307.00862
Autor:
Yang, Ziyi, Khademi, Mahmoud, Xu, Yichong, Pryzant, Reid, Fang, Yuwei, Zhu, Chenguang, Chen, Dongdong, Qian, Yao, Gao, Mei, Chen, Yi-Ling, Gmyr, Robert, Kanda, Naoyuki, Codella, Noel, Xiao, Bin, Shi, Yu, Yuan, Lu, Yoshioka, Takuya, Zeng, Michael, Huang, Xuedong
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing thi
Externí odkaz:
http://arxiv.org/abs/2305.12311
Video understanding tasks have traditionally been modeled by two separate architectures, specially tailored for two distinct tasks. Sequence-based video tasks, such as action recognition, use a video backbone to directly extract spatiotemporal featur
Externí odkaz:
http://arxiv.org/abs/2303.17228
Autor:
You, Haoxuan, Zhou, Luowei, Xiao, Bin, Codella, Noel, Cheng, Yu, Xu, Ruochen, Chang, Shih-Fu, Yuan, Lu
Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space. Typically, this has employed separate encoder
Externí odkaz:
http://arxiv.org/abs/2207.12661