Výsledky vyhledávání - "Kwon, Woosuk"

Report

Gemma 2: Improving Open Language Models at a Practical Size

Autor: Gemma Team, Riviere, Morgane, Pathak, Shreya, Sessa, Pier Giuseppe, Hardin, Cassidy, Bhupatiraju, Surya, Hussenot, Léonard, Mesnard, Thomas, Shahriari, Bobak, Ramé, Alexandre, Ferret, Johan, Liu, Peter, Tafti, Pouya, Friesen, Abe, Casbon, Michelle, Ramos, Sabela, Kumar, Ravin, Lan, Charline Le, Jerome, Sammy, Tsitsulin, Anton, Vieillard, Nino, Stanczyk, Piotr, Girgin, Sertan, Momchev, Nikola, Hoffman, Matt, Thakoor, Shantanu, Grill, Jean-Bastien, Neyshabur, Behnam, Bachem, Olivier, Walton, Alanna, Severyn, Aliaksei, Parrish, Alicia, Ahmad, Aliya, Hutchison, Allen, Abdagic, Alvin, Carl, Amanda, Shen, Amy, Brock, Andy, Coenen, Andy, Laforge, Anthony, Paterson, Antonia, Bastian, Ben, Piot, Bilal, Wu, Bo, Royal, Brandon, Chen, Charlie, Kumar, Chintu, Perry, Chris, Welty, Chris, Choquette-Choo, Christopher A., Sinopalnikov, Danila, Weinberger, David, Vijaykumar, Dimple, Rogozińska, Dominika, Herbison, Dustin, Bandy, Elisa, Wang, Emma, Noland, Eric, Moreira, Erica, Senter, Evan, Eltyshev, Evgenii, Visin, Francesco, Rasskin, Gabriel, Wei, Gary, Cameron, Glenn, Martins, Gus, Hashemi, Hadi, Klimczak-Plucińska, Hanna, Batra, Harleen, Dhand, Harsh, Nardini, Ivan, Mein, Jacinda, Zhou, Jack, Svensson, James, Stanway, Jeff, Chan, Jetha, Zhou, Jin Peng, Carrasqueira, Joana, Iljazi, Joana, Becker, Jocelyn, Fernandez, Joe, van Amersfoort, Joost, Gordon, Josh, Lipschultz, Josh, Newlan, Josh, Ji, Ju-yeong, Mohamed, Kareem, Badola, Kartikeya, Black, Kat, Millican, Katie, McDonell, Keelin, Nguyen, Kelvin, Sodhia, Kiranbir, Greene, Kish, Sjoesund, Lars Lowe, Usui, Lauren, Sifre, Laurent, Heuermann, Lena, Lago, Leticia, McNealus, Lilly, Soares, Livio Baldini, Kilpatrick, Logan, Dixon, Lucas, Martins, Luciano, Reid, Machel, Singh, Manvinder, Iverson, Mark, Görner, Martin, Velloso, Mat, Wirth, Mateo, Davidow, Matt, Miller, Matt, Rahtz, Matthew, Watson, Matthew, Risdal, Meg, Kazemi, Mehran, Moynihan, Michael, Zhang, Ming, Kahng, Minsuk, Park, Minwoo, Rahman, Mofi, Khatwani, Mohit, Dao, Natalie, Bardoliwalla, Nenshad, Devanathan, Nesh, Dumai, Neta, Chauhan, Nilay, Wahltinez, Oscar, Botarda, Pankil, Barnes, Parker, Barham, Paul, Michel, Paul, Jin, Pengchong, Georgiev, Petko, Culliton, Phil, Kuppala, Pradeep, Comanescu, Ramona, Merhej, Ramona, Jana, Reena, Rokni, Reza Ardeshir, Agarwal, Rishabh, Mullins, Ryan, Saadat, Samaneh, Carthy, Sara Mc, Cogan, Sarah, Perrin, Sarah, Arnold, Sébastien M. R., Krause, Sebastian, Dai, Shengyang, Garg, Shruti, Sheth, Shruti, Ronstrom, Sue, Chan, Susan, Jordan, Timothy, Yu, Ting, Eccles, Tom, Hennigan, Tom, Kocisky, Tomas, Doshi, Tulsee, Jain, Vihan, Yadav, Vikas, Meshram, Vilobh, Dharmadhikari, Vishal, Barkley, Warren, Wei, Wei, Ye, Wenming, Han, Woohyun, Kwon, Woosuk, Xu, Xiang, Shen, Zhe, Gong, Zhitao, Wei, Zichuan, Cotruta, Victor, Kirk, Phoebe, Rao, Anand, Giang, Minh, Peran, Ludovic, Warkentin, Tris, Collins, Eli, Barral, Joelle, Ghahramani, Zoubin, Hadsell, Raia, Sculley, D., Banks, Jeanine, Dragan, Anca, Petrov, Slav, Vinyals, Oriol, Dean, Jeff, Hassabis, Demis, Kavukcuoglu, Koray, Farabet, Clement, Buchatskaya, Elena, Borgeaud, Sebastian, Fiedel, Noah, Joulin, Armand, Kenealy, Kathleen, Dadashi, Robert, Andreev, Alek

In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the

Externí odkaz: http://arxiv.org/abs/2408.00118

Zobrazit plný text záznamu

Report

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput

Autor: Liu, Xiaoxuan, Daniel, Cade, Hu, Langxiang, Kwon, Woosuk, Li, Zhuohan, Mo, Xiangxi, Cheung, Alvin, Deng, Zhijie, Stoica, Ion, Zhang, Hao

Reducing the inference latency of large language models (LLMs) is crucial, and speculative decoding (SD) stands out as one of the most effective techniques. Rather than letting the LLM generate all tokens directly, speculative decoding employs effect

Externí odkaz: http://arxiv.org/abs/2406.14066

Zobrazit plný text záznamu

Report

Efficient Memory Management for Large Language Model Serving with PagedAttention

Autor: Kwon, Woosuk, Li, Zhuohan, Zhuang, Siyuan, Sheng, Ying, Zheng, Lianmin, Yu, Cody Hao, Gonzalez, Joseph E., Zhang, Hao, Stoica, Ion

High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamicall

Externí odkaz: http://arxiv.org/abs/2309.06180

Zobrazit plný text záznamu

Report

A Fast Post-Training Pruning Framework for Transformers

Autor: Kwon, Woosuk, Kim, Sehoon, Mahoney, Michael W., Hassoun, Joseph, Keutzer, Kurt, Gholami, Amir

Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it diffic

Externí odkaz: http://arxiv.org/abs/2204.09656

Zobrazit plný text záznamu

Report

Learned Token Pruning for Transformers

Autor: Kim, Sehoon, Shen, Sheng, Thorsley, David, Gholami, Amir, Kwon, Woosuk, Hassoun, Joseph, Keutzer, Kurt

Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes unimportant to

Externí odkaz: http://arxiv.org/abs/2107.00910

Zobrazit plný text záznamu

Report

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Autor: Kwon, Woosuk, Yu, Gyeong-In, Jeong, Eunji, Chun, Byung-Gon

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computati

Externí odkaz: http://arxiv.org/abs/2012.02732

Zobrazit plný text záznamu

Learned Token Pruning for Transformers

Autor: Kim, Sehoon, Shen, Sheng, Thorsley, David, Gholami, Amir, Kwon, Woosuk, Hassoun, Joseph, Keutzer, Kurt

Publikováno v: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1d44e84a2926d5b6eac550077f4c20ae
https://doi.org/10.1145/3534678.3539260

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Conference

A magnetic resonant loop antenna to enhance the operating distance of 13.56MHz RFID systems.

Autor: Kwon, Woosuk, Gim, Yeong-Gyo, Park, Hyunbin, Kim, Shiho

Publikováno v: 2013 International SoC Design Conference (ISOCC); 2013, p013-014, 2p

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání