Výsledky vyhledávání - "Silva, Bruno P."

Report

Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation

Autor: Chaudhari, Shreyas, Deshpande, Ameet, da Silva, Bruno Castro, Thomas, Philip S.

Evaluating policies using off-policy data is crucial for applying reinforcement learning to real-world problems such as healthcare and autonomous driving. Previous methods for off-policy evaluation (OPE) generally suffer from high variance or irreduc

Externí odkaz: http://arxiv.org/abs/2410.02172

Zobrazit plný text záznamu

Report

Position: Benchmarking is Limited in Reinforcement Learning Research

Autor: Jordan, Scott M., White, Adam, da Silva, Bruno Castro, White, Martha, Thomas, Philip S.

Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous cal

Externí odkaz: http://arxiv.org/abs/2406.16241

Zobrazit plný text záznamu

Report

Segmentation of dense and multi-species bacterial colonies using models trained on synthetic microscopy images

Autor: Hickl, Vincent, Khan, Abid, Rossi, René M., Silva, Bruno F. B., Maniura-Weber, Katharina

The spread of microbial infections is governed by the self-organization of bacteria on surfaces. Limitations of live imaging techniques make collective behaviors in clinically relevant systems challenging to quantify. Here, novel experimental and ima

Externí odkaz: http://arxiv.org/abs/2405.12407

Zobrazit plný text záznamu

Report

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

Autor: Chaudhari, Shreyas, Aggarwal, Pranjal, Murahari, Vishvak, Rajpurohit, Tanmay, Kalyan, Ashwin, Narasimhan, Karthik, Deshpande, Ameet, da Silva, Bruno Castro

State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from hu

Externí odkaz: http://arxiv.org/abs/2404.08555

Zobrazit plný text záznamu

Report

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Autor: Mecklenburg, Nick, Lin, Yiyou, Li, Xiaoxiao, Holstein, Daniel, Nunes, Leonardo, Malvar, Sara, Silva, Bruno, Chandra, Ranveer, Aski, Vijay, Yannam, Pavan Kumar Reddy, Aktas, Tolga, Hendry, Todd

In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge rema

Externí odkaz: http://arxiv.org/abs/2404.00213

Zobrazit plný text záznamu

Report

Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation

Autor: Fernández-Rodríguez, Marcos, Silva, Bruno, Queirós, Sandro, Torres, Helena R., Oliveira, Bruno, Morais, Pedro, Buschle, Lukas R., Correia-Pinto, Jorge, Lima, Estevão, Vilaça, João L.

Publikováno v: Proceedings Volume 12928, Medical Imaging 2024: Image-Guided Procedures, Robotic Interventions, and Modeling; 1292827 (2024)

Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. T

Externí odkaz: http://arxiv.org/abs/2403.10216

Zobrazit plný text záznamu

Report

RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

Autor: Balaguer, Angels, Benara, Vinamra, Cunha, Renato Luiz de Freitas, Filho, Roberto de M. Estevão, Hendry, Todd, Holstein, Daniel, Marsman, Jennifer, Mecklenburg, Nick, Malvar, Sara, Nunes, Leonardo O., Padilha, Rafael, Sharp, Morris, Silva, Bruno, Sharma, Swati, Aski, Vijay, Chandra, Ranveer

There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the ex

Externí odkaz: http://arxiv.org/abs/2401.08406

Zobrazit plný text záznamu

Report

From Past to Future: Rethinking Eligibility Traces

Autor: Gupta, Dhawal, Jordan, Scott M., Chaudhari, Shreyas, Liu, Bo, Thomas, Philip S., da Silva, Bruno Castro

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment

Externí odkaz: http://arxiv.org/abs/2312.12972

Zobrazit plný text záznamu

Report

Behavior Alignment via Reward Function Optimization

Autor: Gupta, Dhawal, Chandak, Yash, Jordan, Scott M., Thomas, Philip S., da Silva, Bruno Castro

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadve

Externí odkaz: http://arxiv.org/abs/2310.19007

Zobrazit plný text záznamu

Report

GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models

Autor: Silva, Bruno, Nunes, Leonardo, Estevão, Roberto, Aski, Vijay, Chandra, Ranveer

Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains, including healthcare and finance. For some tasks, LLMs achieve similar or better performance than trained human beings, t

Externí odkaz: http://arxiv.org/abs/2310.06225

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání