Zobrazeno 1 - 10
of 104
pro vyhledávání: '"Pena, Antonio J."'
Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require rep
Externí odkaz:
http://arxiv.org/abs/2306.13002
Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method
Externí odkaz:
http://arxiv.org/abs/2301.11389
The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineeri
Externí odkaz:
http://arxiv.org/abs/2110.14340
Autor:
Guidotti, Nicolas, Ceyrat, Pedro, Barreto, João, Monteiro, José, Rodrigues, Rodrigo, Fonseca, Ricardo, Martorell, Xavier, Peña, Antonio J.
Publikováno v:
Euro-Par 2021: Parallel Processing. Lecture Notes in Computer Science, vol 12820, pp. 482-498
Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow conc
Externí odkaz:
http://arxiv.org/abs/2106.12485
Publikováno v:
Cluster Comput (2022)
Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for
Externí odkaz:
http://arxiv.org/abs/2103.16234
Autor:
Lloret-Talavera, Guillermo, Jorda, Marc, Servat, Harald, Boemer, Fabian, Chauhan, Chetan, Tomishima, Shigeki, Shah, Nilesh N., Peña, Antonio J.
The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x-10,000x memory and runtime overheads. Secure deep neural netw
Externí odkaz:
http://arxiv.org/abs/2103.16139
Publikováno v:
P. Valero-Lara, R. Sirvent, A. J. Pe\~na, and J. Labarta. "MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain", Parallel Computing, Elsevier, vol. 84, pp. 50-61, May 2019
The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging
Externí odkaz:
http://arxiv.org/abs/2005.06332
Publikováno v:
H. Servat, J. Labarta, H. C. Hoppe, J. Gim\'enez, and A. J. Pe\~na, "Understanding memory access patterns using the BSC performance tools", Parallel Computing, Elsevier, vol. 78, pp. 1-14, Oct. 2018
The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have be
Externí odkaz:
http://arxiv.org/abs/2005.05872
Publikováno v:
S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Pe\~na, "DMR API: Improving cluster productivity by turning applications into malleable", Parallel Computing, Elsevier, vol. 78, pp. 54-66, Oct. 2018
Adaptive workloads can change on--the--fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager
Externí odkaz:
http://arxiv.org/abs/2005.05910
Autor:
Sala, Kevin, Teruel, Xavier, Perez, Josep M., Peña, Antonio J., Beltran, Vicenç, Labarta, Jesus
Publikováno v:
Parallel Computing, 85, 153-166 (2019)
In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both programmability and perform
Externí odkaz:
http://arxiv.org/abs/1901.03271