Výsledky vyhledávání - "Chang, P. D."

Report

Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Autor: Gao, Zhaolin, Zhan, Wenhao, Chang, Jonathan D., Swamy, Gokul, Brantley, Kianté, Lee, Jason D., Sun, Wen

Large Language Models (LLMs) have achieved remarkable success at tasks like summarization that involve a single turn of interaction. However, they can still struggle with multi-turn tasks like dialogue that require long-term planning. Previous works

Externí odkaz: http://arxiv.org/abs/2410.04612

Zobrazit plný text záznamu

Report

Mixed Reality Tele-ultrasound over 750 km: a Clinical Study

Autor: Yeung, Ryan, Black, David, Chen, Patrick B., Lessoway, Victoria, Reid, Janice, Rangel-Suarez, Sergio, Chang, Silvia D., Salcudean, Septimiu E.

Ultrasound is a hand-held, low-cost, non-invasive medical imaging modality which plays a vital role in diagnosing various diseases. Despite this, many rural and remote communities do not have access to ultrasound scans due to the lack of local expert

Externí odkaz: http://arxiv.org/abs/2409.13058

Zobrazit plný text záznamu

Report

Eliminating Surface Oxides of Superconducting Circuits with Noble Metal Encapsulation

Autor: Chang, Ray D., Shumiya, Nana, McLellan, Russell A., Zhang, Yifan, Bland, Matthew P., Bahrami, Faranak, Mun, Junsik, Zhou, Chenyu, Kisslinger, Kim, Cheng, Guangming, Pakpour-Tabrizi, Alexander C., Yao, Nan, Zhu, Yimei, Liu, Mingzhao, Cava, Robert J., Gopalakrishnan, Sarang, Houck, Andrew A., de Leon, Nathalie P.

The lifetime of superconducting qubits is limited by dielectric loss, and a major source of dielectric loss is the native oxide present at the surface of the superconducting metal. Specifically, tantalum-based superconducting qubits have been demonst

Externí odkaz: http://arxiv.org/abs/2408.13051

Zobrazit plný text záznamu

Report

Critique-out-Loud Reward Models

Autor: Ankner, Zachary, Paul, Mansheej, Cui, Brandon, Chang, Jonathan D., Ammanabrolu, Prithviraj

Traditionally, reward models used for reinforcement learning from human feedback (RLHF) are trained to directly predict preference scores without leveraging the generation capabilities of the underlying large language model (LLM). This limits the cap

Externí odkaz: http://arxiv.org/abs/2408.11791

Zobrazit plný text záznamu

Report

The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset

The RSNA Abdominal Traumatic Injury CT (RATIC) dataset is the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. This dataset includes 4,274 studies from 23 institutions across 14 countries. The data

Externí odkaz: http://arxiv.org/abs/2405.19595

Zobrazit plný text záznamu

Report

REBEL: Reinforcement Learning via Regressing Relative Rewards

Autor: Gao, Zhaolin, Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Swamy, Gokul, Brantley, Kianté, Joachims, Thorsten, Bagnell, J. Andrew, Lee, Jason D., Sun, Wen

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO

Externí odkaz: http://arxiv.org/abs/2404.16767

Zobrazit plný text záznamu

Report

Adversarial Imitation Learning via Boosting

Autor: Chang, Jonathan D., Sreenivas, Dhruv, Huang, Yingbing, Brantley, Kianté, Sun, Wen

Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning alg

Externí odkaz: http://arxiv.org/abs/2404.08513

Zobrazit plný text záznamu

Report

Dataset Reset Policy Optimization for RLHF

Autor: Chang, Jonathan D., Zhan, Wenhao, Oertell, Owen, Brantley, Kianté, Misra, Dipendra, Lee, Jason D., Sun, Wen

Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a rewa

Externí odkaz: http://arxiv.org/abs/2404.08495

Zobrazit plný text záznamu

Report

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Autor: Oertell, Owen, Chang, Jonathan D., Zhang, Yiyi, Brantley, Kianté, Sun, Wen

Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit t

Externí odkaz: http://arxiv.org/abs/2404.03673

Zobrazit plný text záznamu

Report

DeepATLAS: One-Shot Localization for Biomedical Data

Autor: Chang, Peter D.

This paper introduces the DeepATLAS foundational model for localization tasks in the domain of high-dimensional biomedical data. Upon convergence of the proposed self-supervised objective, a pretrained model maps an input to an anatomically-consisten

Externí odkaz: http://arxiv.org/abs/2402.09587

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání