Výsledky vyhledávání

Report

Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders

Autor: Dakic, Kosta, Thilakarathna, Kanchana, Calheiros, Rodrigo N., Lim, Teng Joon

Multiview systems have become a key technology in modern computer vision, offering advanced capabilities in scene understanding and analysis. However, these systems face critical challenges in bandwidth limitations and computational constraints, part

Externí odkaz: http://arxiv.org/abs/2410.04817

Zobrazit plný text záznamu

Report

SHFL: Secure Hierarchical Federated Learning Framework for Edge Networks

Autor: Tavallaie, Omid, Thilakarathna, Kanchana, Seneviratne, Suranga, Seneviratne, Aruna, Zomaya, Albert Y.

Federated Learning (FL) is a distributed machine learning paradigm designed for privacy-sensitive applications that run on resource-constrained devices with non-Identically and Independently Distributed (IID) data. Traditional FL frameworks adopt the

Externí odkaz: http://arxiv.org/abs/2409.15067

Zobrazit plný text záznamu

Report

ECHO: Environmental Sound Classification with Hierarchical Ontology-guided Semi-Supervised Learning

Autor: Gupta, Pranav, Sharma, Raunak, Kumari, Rashmi, Aditya, Sri Krishna, Choudhary, Shwetank, Kumar, Sumit, M, Kanchana, R, Thilagavathy

Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised meth

Externí odkaz: http://arxiv.org/abs/2409.14043

Zobrazit plný text záznamu

Report

ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks

Autor: Nazemi, Niousha, Tavallaie, Omid, Chen, Shuaijun, Mandalari, Anna Maria, Thilakarathna, Kanchana, Holz, Ralph, Haddadi, Hamed, Zomaya, Albert Y.

Federated Learning (FL) is a promising distributed learning framework designed for privacy-aware applications. FL trains models on client devices without sharing the client's data and generates a global model on a server by aggregating model updates.

Externí odkaz: http://arxiv.org/abs/2409.01722

Zobrazit plný text záznamu

Report

TripletViNet: Mitigating Misinformation Video Spread Across Platforms

Autor: Smolovic, Petar, Dahanayaka, Thilini, Thilakarathna, Kanchana

Publikováno v: SCID '24, ACM, 1-12 (2024)

There has been rampant propagation of fake news and misinformation videos on many platforms lately, and moderation of such content faces many challenges that must be overcome. Recent research has shown the feasibility of identifying video titles from

Externí odkaz: http://arxiv.org/abs/2407.10644

Zobrazit plný text záznamu

Report

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Autor: Li, Xiang, Mata, Cristina, Park, Jongwoo, Kahatapitiya, Kumara, Jang, Yoo Sung, Shang, Jinghuan, Ranasinghe, Kanchana, Burgert, Ryan, Cai, Mu, Lee, Yong Jae, Ryoo, Michael S.

LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and respond with policy decisions in text. We propose LLaRA: Large Language and Robotics Assistant, a framework tha

Externí odkaz: http://arxiv.org/abs/2406.20095

Zobrazit plný text záznamu

Report

Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA

Autor: Park, Jongwoo, Ranasinghe, Kanchana, Kahatapitiya, Kumara, Ryoo, Wonjeong, Kim, Donghyun, Ryoo, Michael S.

Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely related. Therefore, when performing long-form video question answering (LVQA), all info

Externí odkaz: http://arxiv.org/abs/2406.09396

Zobrazit plný text záznamu

Report

CAFe: Cost and Age aware Federated Learning

Autor: Liyanaarachchi, Sahan, Thilakarathna, Kanchana, Ulukus, Sennur

In many federated learning (FL) models, a common strategy employed to ensure the progress in the training process, is to wait for at least $M$ clients out of the total $N$ clients to send back their local gradients based on a reporting deadline $T$,

Externí odkaz: http://arxiv.org/abs/2405.15744

Zobrazit plný text záznamu

Report

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

Autor: Ranasinghe, Kanchana, Shukla, Satya Narayan, Poursaeed, Omid, Ryoo, Michael S., Lin, Tsung-Yu

Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BL

Externí odkaz: http://arxiv.org/abs/2404.07449

Zobrazit plný text záznamu

Report

Understanding Long Videos with Multimodal Language Models

Autor: Ranasinghe, Kanchana, Li, Xiang, Kahatapitiya, Kumara, Ryoo, Michael S.

Large Language Models (LLMs) have allowed recent LLM-based approaches to achieve excellent performance on long-video understanding benchmarks. We investigate how extensive world knowledge and strong reasoning skills of underlying LLMs influence this

Externí odkaz: http://arxiv.org/abs/2403.16998

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání