Zobrazeno 1 - 10
of 48
pro vyhledávání: '"Guo, Daniel"'
Autor:
Richemond, Pierre Harvey, Tang, Yunhao, Guo, Daniel, Calandriello, Daniele, Azar, Mohammad Gheshlaghi, Rafailov, Rafael, Pires, Bernardo Avila, Tarassov, Eugene, Spangher, Lucas, Ellsworth, Will, Severyn, Aliaksei, Mallinson, Jonathan, Shani, Lior, Shamir, Gil, Joshi, Rishabh, Liu, Tianqi, Munos, Remi, Piot, Bilal
The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is
Externí odkaz:
http://arxiv.org/abs/2405.19107
Autor:
Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will
Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF. Within the context of rewar
Externí odkaz:
http://arxiv.org/abs/2405.08448
Autor:
Calandriello, Daniele, Guo, Daniel, Munos, Remi, Rowland, Mark, Tang, Yunhao, Pires, Bernardo Avila, Richemond, Pierre Harvey, Lan, Charline Le, Valko, Michal, Liu, Tianqi, Joshi, Rishabh, Zheng, Zeyu, Piot, Bilal
Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human alignment has been extensively studied recently and several methods such as Reinforcement Learnin
Externí odkaz:
http://arxiv.org/abs/2403.08635
Autor:
Azar, Mohammad Gheshlaghi, Rowland, Mark, Piot, Bilal, Guo, Daniel, Calandriello, Daniele, Valko, Michal, Munos, Rémi
The prevalent deployment of learning from human preferences through reinforcement learning (RLHF) relies on two important approximations: the first assumes that pairwise preferences can be substituted with pointwise rewards. The second assumes that a
Externí odkaz:
http://arxiv.org/abs/2310.12036
Autor:
Guo, Daniel, Pires, Bernardo Avila, Piot, Bilal, Grill, Jean-bastien, Altché, Florent, Munos, Rémi, Azar, Mohammad Gheshlaghi
Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and partially observable settings where building a representation of the unknown environment i
Externí odkaz:
http://arxiv.org/abs/2004.14646
Autor:
Badia, Adrià Puigdomènech, Piot, Bilal, Kapturowski, Steven, Sprechmann, Pablo, Vitvitskyi, Alex, Guo, Daniel, Blundell, Charles
Atari games have been a long-standing benchmark in the reinforcement learning (RL) community for the past decade. This benchmark was proposed to test general competency of RL algorithms. Previous work has achieved good average performance by doing ou
Externí odkaz:
http://arxiv.org/abs/2003.13350
Autor:
Badia, Adrià Puigdomènech, Sprechmann, Pablo, Vitvitskyi, Alex, Guo, Daniel, Piot, Bilal, Kapturowski, Steven, Tieleman, Olivier, Arjovsky, Martín, Pritzel, Alexander, Bolt, Andew, Blundell, Charles
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to tra
Externí odkaz:
http://arxiv.org/abs/2002.06038
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Abouhala, Siwaar, Albert, Jessica, Almalvez, Miguel, Alvarez, Raquel, Amin, Mutaz, Anderson, Peter, Aradhya, Swaroop, Ashley, Euan, Assimes, Themistocles, Auriga, Light, Austin-Tse, Christina, Bamshad, Mike, Barseghyan, Hayk, Baxter, Samantha, Behera, Sairam, Beheshti, Shaghayegh, Bejerano, Gill, Berger, Seth, Bernstein, Jon, Best, Sabrina, Blankenmeister, Benjamin, Blue, Elizabeth, Boerwinkle, Eric, Bonkowski, Emily, Bonner, Devon, Boone, Philip, Bornhorst, Miriam, Bozkurt-Yozgatli, Tugce, Brand, Harrison, Buckingham, Kati, Calame, Daniel, Casadei, Silvia, Chadwick, Lisa, Chavez, Clarisa, Chen, Ziwei, Chinn, Ivan, Chong, Jessica, Coban-Akdemir, Zeynep, Cohen, Andrea J., Conner, Sarah, Conomos, Matthew, Coveler, Karen, Cui, Ya Allen, Currin, Sara, Daber, Robert, Dardas, Zain, Davis, Colleen, Dawood, Moez, de Dios, Ivan, de Esch, Celine, Delaney, Meghan, Délot, Emmanuèle, DiTroia, Stephanie, Doddapaneni, Harsha, Du, Haowei, Duan, Ruizhi, Dugan-Perez, Shannon, Duong, Nhat, Duyzend, Michael, Eichler, Evan, Emami, Sara, Fatih, Jawid, Fraser, Jamie, Fusaro, Vincent, Galey, Miranda, Ganesh, Vijay, Garimella, Kiran, Gibbs, Richard, Gifford, Casey, Ginsburg, Amy, Goddard, Pagé, Gogarten, Stephanie, Gogate, Nikhita, Gordon, William, Gorzynski, John E., Greenleaf, William, Grochowski, Christopher, Groopman, Emily, Guarischi Sousa, Rodrigo, Gudmundsson, Sanna, Gulati, Ashima, Guo, Daniel, Hale, Walker, Hall, Stacey, Harvey, William, Hawley, Megan, Heavner, Ben, Herman, Isabella, Horike-Pyne, Martha, Hu, Jianhong, Huang, Yongqing, Hwang, James, Jarvik, Gail, Jensen, Tanner, Jhangiani, Shalini, Jimenez-Morales, David, Jin, Christopher, Saad, Ahmed K., Kahn-Kirby, Amanda, Kain, Jessica, Kaur, Parneet, Keehan, Laura, Knoblach, Susan, Ko, Arthur, Kohler, Jennefer, Kundaje, Anshul, Kundu, Soumya, Lancaster, Samuel M., Larsson, Katie, Lemire, Gabrielle, Lewis, Richard, Li, Wei, Li, Yidan, Liu, Pengfei, LoTempio, Jonathan, Lupski, James, Ma, Jialan, MacArthur, Daniel, Mahmoud, Medhat, Malani, Nirav, Mangilog, Brian, Marafi, Dana, Marmolejos, Sofia, Marten, Daniel, Martinez, Eva, Marvin, Colby, Marwaha, Shruti, Kumara Mastrorosa, Francesco, Matalon, Dena, May, Susanne, McGee, Sean, Meador, Lauren, Mefford, Heather, Rodrigo Mendez, Hector, Miller, Alexander, Miller, Danny E., Mitani, Tadahiro, Montgomery, Stephen, Moussa, Hala Mohamed, Moyses, Mariana, Munderloh, Chloe, Muzny, Donna, Nelson, Sarah, Neu, Matthew B., Nguyen, Jonathan, Nguyen, Thuy-mi P., Nussbaum, Robert, Nykamp, Keith, O'Callaghan, William, O'Heir, Emily, O'Leary, Melanie, Olsen, Jeren, Osei-Owusu, Ikeoluwa, O'Donnell-Luria, Anne, Padhi, Evin, Pais, Lynn, Pan, Miao, Panchal, Piyush, Patterson, Karynne, Payne, Sheryl, Pehlivan, Davut, Petrowski, Paul, Pham, Alicia, Pitsava, Georgia, Podesta, Astaria, Ponce, Sarah, Posey, Jennifer, Prosser, Jaime, Quertermous, Thomas, Rai, Archana, Ramani, Arun, Rehm, Heidi, Reuter, Chloe, Reuter, Jason, Richardson, Matthew, Rivera-Munoz, Andres, Rubio, Oriane, Sabo, Aniko, Salani, Monica, Samocha, Kaitlin, Sanchis-Juan, Alba, Savage, Sarah, Scott, Stuart, Scott, Evette, Sedlazeck, Fritz, Shah, Gulalai, Shojaie, Ali, Singh, Mugdha, Smith, Josh, Smith, Kevin, Snow, Hana, Snyder, Michael, Socarras, Kayla, Starita, Lea, Stark, Brigitte, Stenton, Sarah, Stergachis, Andrew, Stilp, Adrienne, Sundaram, Laksshman, Sutton, V. Reid, Tai, Jui-Cheng, Talkowski, Michael, Tise, Christina, Tong, Catherine, Tsao, Philip, Ungar, Rachel, VanNoy, Grace, Vilain, Eric, Voutos, Isabella, Walker, Kim, Weisburd, Ben, Weiss, Jeff, Wellington, Chris, Weng, Ziming, Westheimer, Emily, Wheeler, Marsha, Wheeler, Matthew, Wiel, Laurens, Wilson, Michael, Wojcik, Monica, Wong, Quenna, Wong, Issac, Xiao, Changrui, Yadav, Rachita, Yi, Qian, Yuan, Bo, Zhao, Jianhua, Zhen, Jimmy, Zhou, Harry, Wojcik, Monica H., Reuter, Chloe M., Duyzend, Michael H., Boone, Philip M., Groopman, Emily E., Délot, Emmanuèle C., Jain, Deepti, Starita, Lea M., Montgomery, Stephen B., Bamshad, Michael J., Chong, Jessica X., Wheeler, Matthew T., Berger, Seth I., Sedlazeck, Fritz J.
Publikováno v:
In The American Journal of Human Genetics 3 August 2023 110(8):1229-1248
Autor:
Lipscomb, Nikolai D., Guo, Daniel X.
Semi-Lagrangian methods are numerical methods designed to find approximate solutions to particular time-dependent partial differential equations (PDEs) that describe the advection process. We propose semi-Lagrangian one-step methods for numerically s
Externí odkaz:
http://arxiv.org/abs/1703.01699