Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Oz, Gokmen"'
Teacher-student knowledge distillation is a popular technique for compressing today's prevailing large language models into manageable sizes that fit low-latency downstream applications. Both the teacher and the choice of transfer set used for distil
Externí odkaz:
http://arxiv.org/abs/2210.04834
Autor:
FitzGerald, Jack, Ananthakrishnan, Shankar, Arkoudas, Konstantine, Bernardi, Davide, Bhagia, Abhishek, Bovi, Claudio Delli, Cao, Jin, Chada, Rakesh, Chauhan, Amit, Chen, Luoxin, Dwarakanath, Anurag, Dwivedi, Satyam, Gojayev, Turan, Gopalakrishnan, Karthik, Gueudre, Thomas, Hakkani-Tur, Dilek, Hamza, Wael, Hueser, Jonathan, Jose, Kevin Martin, Khan, Haidar, Liu, Beiye, Lu, Jianhua, Manzotti, Alessandro, Natarajan, Pradeep, Owczarzak, Karolina, Oz, Gokmen, Palumbo, Enrico, Peris, Charith, Prakash, Chandana Satya, Rawls, Stephen, Rosenbaum, Andy, Shenoy, Anjali, Soltan, Saleh, Sridhar, Mukund Harakere, Tan, Liz, Triefenbach, Fabian, Wei, Pan, Yu, Haiyang, Zheng, Shuai, Tur, Gokhan, Natarajan, Prem
Publikováno v:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the N
Externí odkaz:
http://arxiv.org/abs/2206.07808
Autor:
Peris, Charith, Oz, Gokmen, Abboud, Khadige, Varada, Venkata sai, Wanigasekara, Prashan, Khan, Haidar
Current voice assistants typically use the best hypothesis yielded by their Automatic Speech Recognition (ASR) module as input to their Natural Language Understanding (NLU) module, thereby losing helpful information that might be stored in lower-rank
Externí odkaz:
http://arxiv.org/abs/2012.04099
Publikováno v:
Journal of Transport Economics and Policy, 2018 Jul 01. 52(3), 298-321.
Externí odkaz:
https://www.jstor.org/stable/90020696
Autor:
FitzGerald, Jack, Ananthakrishnan, Shankar, Arkoudas, Konstantine, Bernardi, Davide, Bhagia, Abhishek, Bovi, Claudio Delli, Cao, Jin, Chada, Rakesh, Chauhan, Amit, Chen, Luoxin, Dwarakanath, Anurag, Dwivedi, Satyam, Gojayev, Turan, Gopalakrishnan, Karthik, Gueudre, Thomas, Hakkani-Tur, Dilek, Hamza, Wael, Hueser, Jonathan, Jose, Kevin Martin, Khan, Haidar, Liu, Beiye, Lu, Jianhua, Manzotti, Alessandro, Natarajan, Pradeep, Owczarzak, Karolina, Oz, Gokmen, Palumbo, Enrico, Peris, Charith, Prakash, Chandana Satya, Rawls, Stephen, Rosenbaum, Andy, Shenoy, Anjali, Soltan, Saleh, Sridhar, Mukund Harakere, Tan, Liz, Triefenbach, Fabian, Wei, Pan, Yu, Haiyang, Zheng, Shuai, Tur, Gokhan, Natarajan, Prem
Publikováno v:
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the N