Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Stock, Pierre"'
Autor:
Agrawal, Pravesh, Antoniak, Szymon, Hanna, Emma Bou, Bout, Baptiste, Chaplot, Devendra, Chudnovsky, Jessica, Costa, Diogo, De Monicault, Baudouin, Garg, Saurabh, Gervet, Theophile, Ghosh, Soham, Héliou, Amélie, Jacob, Paul, Jiang, Albert Q., Khandelwal, Kartik, Lacroix, Timothée, Lample, Guillaume, Casas, Diego Las, Lavril, Thibaut, Scao, Teven Le, Lo, Andy, Marshall, William, Martin, Louis, Mensch, Arthur, Muddireddy, Pavankumar, Nemychnikova, Valera, Pellat, Marie, Von Platen, Patrick, Raghuraman, Nikhil, Rozière, Baptiste, Sablayrolles, Alexandre, Saulnier, Lucile, Sauvestre, Romain, Shang, Wendy, Soletskyi, Roman, Stewart, Lawrence, Stock, Pierre, Studnia, Joachim, Subramanian, Sandeep, Vaze, Sagar, Wang, Thomas, Yang, Sophia
We introduce Pixtral-12B, a 12--billion-parameter multimodal language model. Pixtral-12B is trained to understand both natural images and documents, achieving leading performance on various multimodal benchmarks, surpassing a number of larger models.
Externí odkaz:
http://arxiv.org/abs/2410.07073
Autor:
Jiang, Albert Q., Sablayrolles, Alexandre, Roux, Antoine, Mensch, Arthur, Savary, Blanche, Bamford, Chris, Chaplot, Devendra Singh, Casas, Diego de las, Hanna, Emma Bou, Bressand, Florian, Lengyel, Gianna, Bour, Guillaume, Lample, Guillaume, Lavaud, Lélio Renard, Saulnier, Lucile, Lachaux, Marie-Anne, Stock, Pierre, Subramanian, Sandeep, Yang, Sophia, Antoniak, Szymon, Scao, Teven Le, Gervet, Théophile, Lavril, Thibaut, Wang, Thomas, Lacroix, Timothée, Sayed, William El
We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each layer, a r
Externí odkaz:
http://arxiv.org/abs/2401.04088
Autor:
Jiang, Albert Q., Sablayrolles, Alexandre, Mensch, Arthur, Bamford, Chris, Chaplot, Devendra Singh, Casas, Diego de las, Bressand, Florian, Lengyel, Gianna, Lample, Guillaume, Saulnier, Lucile, Lavaud, Lélio Renard, Lachaux, Marie-Anne, Stock, Pierre, Scao, Teven Le, Lavril, Thibaut, Wang, Thomas, Lacroix, Timothée, Sayed, William El
We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation.
Externí odkaz:
http://arxiv.org/abs/2310.06825
Autor:
Liu, Zechun, Oguz, Barlas, Zhao, Changsheng, Chang, Ernie, Stock, Pierre, Mehdad, Yashar, Shi, Yangyang, Krishnamoorthi, Raghuraman, Chandra, Vikas
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware trainin
Externí odkaz:
http://arxiv.org/abs/2305.17888
Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device model
Externí odkaz:
http://arxiv.org/abs/2305.12997
Autor:
Yousefpour, Ashkan, Guo, Shen, Shenoy, Ashish, Ghosh, Sayan, Stock, Pierre, Maeng, Kiwan, Krüger, Schalk-Willem, Rabbat, Michael, Wu, Carole-Jean, Mironov, Ilya
The rapid progress of AI is fueled by increasingly large and computationally intensive machine learning models and datasets. As a consequence, the amount of compute used in training state-of-the-art models is exponentially increasing (doubling every
Externí odkaz:
http://arxiv.org/abs/2303.14604
In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accurac
Externí odkaz:
http://arxiv.org/abs/2211.03942
Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of training steps. These techniques require much more
Externí odkaz:
http://arxiv.org/abs/2210.03403
Federated Learning (FL) is a setting for training machine learning models in distributed environments where the clients do not share their raw data but instead send model updates to a server. However, model updates can be subject to attacks and leak
Externí odkaz:
http://arxiv.org/abs/2210.02912
Autor:
Prasad, Karthik, Ghosh, Sayan, Cormode, Graham, Mironov, Ilya, Yousefpour, Ashkan, Stock, Pierre
Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottl
Externí odkaz:
http://arxiv.org/abs/2207.12779