Zobrazeno 1 - 10
of 2 598
pro vyhledávání: '"P., Kanchana"'
Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, part
Externí odkaz:
http://arxiv.org/abs/2410.04817
Autor:
Tavallaie, Omid, Thilakarathna, Kanchana, Seneviratne, Suranga, Seneviratne, Aruna, Zomaya, Albert Y.
Federated Learning (FL) is a distributed machine learning paradigm designed for privacy-sensitive applications that run on resource-constrained devices with non-Identically and Independently Distributed (IID) data. Traditional FL frameworks adopt the
Externí odkaz:
http://arxiv.org/abs/2409.15067
Autor:
Gupta, Pranav, Sharma, Raunak, Kumari, Rashmi, Aditya, Sri Krishna, Choudhary, Shwetank, Kumar, Sumit, M, Kanchana, R, Thilagavathy
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised meth
Externí odkaz:
http://arxiv.org/abs/2409.14043
Autor:
Nazemi, Niousha, Tavallaie, Omid, Chen, Shuaijun, Mandalari, Anna Maria, Thilakarathna, Kanchana, Holz, Ralph, Haddadi, Hamed, Zomaya, Albert Y.
Federated Learning (FL) is a promising distributed learning framework designed for privacy-aware applications. FL trains models on client devices without sharing the client's data and generates a global model on a server by aggregating model updates.
Externí odkaz:
http://arxiv.org/abs/2409.01722
Publikováno v:
SCID '24, ACM, 1-12 (2024)
There has been rampant propagation of fake news and misinformation videos on many platforms lately, and moderation of such content faces many challenges that must be overcome. Recent research has shown the feasibility of identifying video titles from
Externí odkaz:
http://arxiv.org/abs/2407.10644
Autor:
Li, Xiang, Mata, Cristina, Park, Jongwoo, Kahatapitiya, Kumara, Jang, Yoo Sung, Shang, Jinghuan, Ranasinghe, Kanchana, Burgert, Ryan, Cai, Mu, Lee, Yong Jae, Ryoo, Michael S.
LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and respond with policy decisions in text. We propose LLaRA: Large Language and Robotics Assistant, a framework tha
Externí odkaz:
http://arxiv.org/abs/2406.20095
Autor:
Park, Jongwoo, Ranasinghe, Kanchana, Kahatapitiya, Kumara, Ryoo, Wonjeong, Kim, Donghyun, Ryoo, Michael S.
Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely related. Therefore, when performing long-form video question answering (LVQA), all info
Externí odkaz:
http://arxiv.org/abs/2406.09396
In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$,
Externí odkaz:
http://arxiv.org/abs/2405.15744
Autor:
Ranasinghe, Kanchana, Shukla, Satya Narayan, Poursaeed, Omid, Ryoo, Michael S., Lin, Tsung-Yu
Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BL
Externí odkaz:
http://arxiv.org/abs/2404.07449
Large Language Models (LLMs) have allowed recent LLM-based approaches to achieve excellent performance on long-video understanding benchmarks. We investigate how extensive world knowledge and strong reasoning skills of underlying LLMs influence this
Externí odkaz:
http://arxiv.org/abs/2403.16998