Výsledky vyhledávání

Report

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Autor: Aksitov, Renat, Miryoosefi, Sobhan, Li, Zonglin, Li, Daliang, Babayan, Sheila, Kopparapu, Kavya, Fisher, Zachary, Guo, Ruiqi, Prakash, Sushant, Srinivasan, Pranesh, Zaheer, Manzil, Yu, Felix, Kumar, Sanjiv

Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, ho

Externí odkaz: http://arxiv.org/abs/2312.10003

Zobrazit plný text záznamu

Report

Large Language Models with Controllable Working Memory

Autor: Li, Daliang, Rawat, Ankit Singh, Zaheer, Manzil, Wang, Xin, Lukasik, Michal, Veit, Andreas, Yu, Felix, Kumar, Sanjiv

Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world

Externí odkaz: http://arxiv.org/abs/2211.05110

Zobrazit plný text záznamu

Report

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

Autor: Wang, Yihan, Si, Si, Li, Daliang, Lukasik, Michal, Yu, Felix, Hsieh, Cho-Jui, Dhillon, Inderjit S, Kumar, Sanjiv

Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually

Externí odkaz: http://arxiv.org/abs/2211.00635

Zobrazit plný text záznamu

Report

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

Autor: Li, Zonglin, You, Chong, Bhojanapalli, Srinadh, Li, Daliang, Rawat, Ankit Singh, Reddi, Sashank J., Ye, Ke, Chern, Felix, Yu, Felix, Guo, Ruiqi, Kumar, Sanjiv

This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activ

Externí odkaz: http://arxiv.org/abs/2210.06313

Zobrazit plný text záznamu

Akademický článek

Comprehensive insights into fluorescent probes for the determination nitric oxide for diseases diagnosis

Autor: Ye, Chenqian, Lin, Shufang, Li, Jinyi, Meng, Peng, Huang, Luqiang, Li, Daliang

Publikováno v: In Bioorganic Chemistry September 2024 150

Zobrazit plný text záznamu

Akademický článek

Accurate and rapid mercury susceptibility detection in aquatic samples using fluorescent probe integrated rhodamine with pyridyl isothiocyanate

Autor: Lai, Liqing, Li, Jinyi, Huang, Yudong, Liu, Huafeng, Lin, Xinye, Huang, Luqiang, Li, Daliang

Publikováno v: In Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 15 December 2024 323

Zobrazit plný text záznamu

Akademický článek

Recent progress of near-infrared fluorescent probes in the determination of reactive oxygen species for disease diagnosis

Autor: Lin, Shufang, Ye, Chenqian, Lin, Zengyan, Huang, Luqiang, Li, Daliang

Publikováno v: In Talanta 1 February 2024 268 Part 1

Zobrazit plný text záznamu

Report

Understanding Robustness of Transformers for Image Classification

Autor: Bhojanapalli, Srinadh, Chakrabarti, Ayan, Glasner, Daniel, Li, Daliang, Unterthiner, Thomas, Veit, Andreas

Deep Convolutional Neural Networks (CNNs) have long been the architecture of choice for computer vision tasks. Recently, Transformer-based architectures like Vision Transformer (ViT) have matched or even surpassed ResNets for image classification. Ho

Externí odkaz: http://arxiv.org/abs/2103.14586

Zobrazit plný text záznamu

Report

Modifying Memories in Transformer Models

Autor: Zhu, Chen, Rawat, Ankit Singh, Zaheer, Manzil, Bhojanapalli, Srinadh, Li, Daliang, Yu, Felix, Kumar, Sanjiv

Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of parameters

Externí odkaz: http://arxiv.org/abs/2012.00363

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání