Výsledky vyhledávání

Report

PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

Autor: Xu, Haoran, Liu, Ziqian, Fu, Rong, Su, Zhongling, Wang, Zerui, Cai, Zheng, Pei, Zhilin, Zhang, Xingcheng

With the evolution of large language models, traditional Transformer models become computationally demanding for lengthy sequences due to the quadratic growth in computation with respect to the sequence length. Mamba, emerging as a groundbreaking arc

Externí odkaz: http://arxiv.org/abs/2408.03865

Zobrazit plný text záznamu

Report

Adaptive time-stepping for aggregation-shattering kinetics

Autor: Matveev, Sergey A., Zhilin, Viktor, Smirnov, Alexander P.

We propose an experimental study of adaptive time-stepping methods for efficient modeling of the aggregation-fragmentation kinetics. Precise modeling of this phenomena usually requires utilization of the large systems of nonlinear ordinary differenti

Externí odkaz: http://arxiv.org/abs/2407.16559

Zobrazit plný text záznamu

Report

Contrastive Adversarial Training for Unsupervised Domain Adaptation

Autor: Chen, Jiahong, Zhang, Zhilin, Li, Lucy, Shahrasbi, Behzad, Mishra, Arjun

Domain adversarial training has shown its effective capability for finding domain invariant feature representations and been successfully adopted for various domain adaptation tasks. However, recent advances of large models (e.g., vision transformers

Externí odkaz: http://arxiv.org/abs/2407.12782

Zobrazit plný text záznamu

Report

Transcranial low-level laser stimulation in near infrared-II region for brain safety and protection

Autor: Li, Zhilin, Zhao, Yongheng, Hu, Yiqing, Li, Yang, Zhang, Keyao, Gao, Zhibing, Tan, Lirou, Liu, Hanli, Li, Xiaoli, Cao, Aihua, Cui, Zaixu, Zhao, Chenguang

Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuropr

Externí odkaz: http://arxiv.org/abs/2407.09922

Zobrazit plný text záznamu

Report

Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

Autor: Zhu, Zhilin, Hong, Xiaopeng, Ma, Zhiheng, Zhuang, Weijun, Ma, Yaohui, Dai, Yong, Wang, Yaowei

Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and

Externí odkaz: http://arxiv.org/abs/2407.09367

Zobrazit plný text záznamu

Report

Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Autor: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Liu, Bo, Jhunjhunwala, Aastha, Wang, Zhilin, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan

The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a l

Externí odkaz: http://arxiv.org/abs/2407.06380

Zobrazit plný text záznamu

Report

Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Autor: Liang, Quanmin, Huang, Zhilin, Zheng, Xiawu, Yang, Feidiao, Peng, Jun, Huang, Kai, Tian, Yonghong

Publikováno v: International Joint Conference on Artificial Intelligence 2024

Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detai

Externí odkaz: http://arxiv.org/abs/2406.19640

Zobrazit plný text záznamu

Report

Nemotron-4 340B Technical Report

Autor: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri

Externí odkaz: http://arxiv.org/abs/2406.11704

Zobrazit plný text záznamu

Report

Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Autor: Jiang, Jianan, Tang, Hao, Jiang, Zhilin, Yu, Weiren, Wu, Di

Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract na

Externí odkaz: http://arxiv.org/abs/2406.11551

Zobrazit plný text záznamu

Report

Chip-scale generation of 60-mode continuous-variable cluster states

Autor: Wang, Ze, Li, Kangkang, Wang, Yue, Zhou, Xin, Cheng, Yinke, Jing, Boxuan, Sun, Fengxiao, Li, Jincheng, Li, Zhilin, Gong, Qihuang, He, Qiongyi, Li, Bei-Bei, Yang, Qi-Fan

Increasing the number of entangled entities is crucial for achieving exponential computational speedups and secure quantum networks. Despite recent progress in generating large-scale entanglement through continuous-variable (CV) cluster states, trans

Externí odkaz: http://arxiv.org/abs/2406.10715

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání