Výsledky vyhledávání - "Pena, Antonio J."

Report

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Autor: Matsumura, Kazuaki, De Gonzalo, Simon Garcia, Peña, Antonio J.

Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they require rep

Externí odkaz: http://arxiv.org/abs/2306.13002

Zobrazit plný text záznamu

Report

A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code

Autor: Matsumura, Kazuaki, De Gonzalo, Simon Garcia, Peña, Antonio J.

Various kinds of applications take advantage of GPUs through automation tools that attempt to automatically exploit the available performance of the GPU's parallel architecture. Directive-based programming models, such as OpenACC, are one such method

Externí odkaz: http://arxiv.org/abs/2301.11389

Zobrazit plný text záznamu

Report

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

Autor: Matsumura, Kazuaki, De Gonzalo, Simon Garcia, Peña, Antonio J.

The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications. Efforts on such models involve a least engineeri

Externí odkaz: http://arxiv.org/abs/2110.14340

Zobrazit plný text záznamu

Report

Particle-In-Cell Simulation using Asynchronous Tasking

Autor: Guidotti, Nicolas, Ceyrat, Pedro, Barreto, João, Monteiro, José, Rodrigues, Rodrigo, Fonseca, Ricardo, Martorell, Xavier, Peña, Antonio J.

Publikováno v: Euro-Par 2021: Parallel Processing. Lecture Notes in Computer Science, vol 12820, pp. 482-498

Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow conc

Externí odkaz: http://arxiv.org/abs/2106.12485

Zobrazit plný text záznamu

Report

cuConv: A CUDA Implementation of Convolution for CNN Inference

Autor: Jordà, Marc, Valero-Lara, Pedro, Peña, Antonio J.

Publikováno v: Cluster Comput (2022)

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in production for

Externí odkaz: http://arxiv.org/abs/2103.16234

Zobrazit plný text záznamu

Report

Enabling Homomorphically Encrypted Inference for Large DNN Models

Autor: Lloret-Talavera, Guillermo, Jorda, Marc, Servat, Harald, Boemer, Fabian, Chauhan, Chetan, Tomishima, Shigeki, Shah, Nilesh N., Peña, Antonio J.

The proliferation of machine learning services in the last few years has raised data privacy concerns. Homomorphic encryption (HE) enables inference using encrypted data but it incurs 100x-10,000x memory and runtime overheads. Secure deep neural netw

Externí odkaz: http://arxiv.org/abs/2103.16139

Zobrazit plný text záznamu

Report

MPI+OpenMP Tasking Scalability for Multi-Morphology Simulations of the Human Brain

Autor: Valero-Lara, Pedro, Sirvent, Raül, Peña, Antonio J., Labarta, Jesús

Publikováno v: P. Valero-Lara, R. Sirvent, A. J. Pe\~na, and J. Labarta. "MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain", Parallel Computing, Elsevier, vol. 84, pp. 50-61, May 2019

The simulation of the behavior of the human brain is one of the most ambitious challenges today with a non-end of important applications. We can find many different initiatives in the USA, Europe and Japan which attempt to achieve such a challenging

Externí odkaz: http://arxiv.org/abs/2005.06332

Zobrazit plný text záznamu

Report

Understanding Memory Access Patterns Using the BSC Performance Tools

Autor: Servat, Harald, Labarta, Jesús, Hoppe, Hans-Christian, Giménez, Judit, Peña, Antonio J.

Publikováno v: H. Servat, J. Labarta, H. C. Hoppe, J. Gim\'enez, and A. J. Pe\~na, "Understanding memory access patterns using the BSC performance tools", Parallel Computing, Elsevier, vol. 78, pp. 1-14, Oct. 2018

The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have be

Externí odkaz: http://arxiv.org/abs/2005.05872

Zobrazit plný text záznamu

Report

DMR API: Improving cluster productivity by turning applications into malleable

Autor: Iserte, Sergio, Mayo, Rafael, Quintana-Orti, Enrique S., Beltran, Vicenc, Peña, Antonio J.

Publikováno v: S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Pe\~na, "DMR API: Improving cluster productivity by turning applications into malleable", Parallel Computing, Elsevier, vol. 78, pp. 54-66, Oct. 2018

Adaptive workloads can change on--the--fly the configuration of their jobs, in terms of number of processes. In order to carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager

Externí odkaz: http://arxiv.org/abs/2005.05910

Zobrazit plný text záznamu

Report

Integrating Blocking and Non-Blocking MPI Primitives with Task-Based Programming Models

Autor: Sala, Kevin, Teruel, Xavier, Perez, Josep M., Peña, Antonio J., Beltran, Vicenç, Labarta, Jesus

Publikováno v: Parallel Computing, 85, 153-166 (2019)

In this paper we present the Task-Aware MPI library (TAMPI) that integrates both blocking and non-blocking MPI primitives with task-based programming models. The TAMPI library leverages two new runtime APIs to improve both programmability and perform

Externí odkaz: http://arxiv.org/abs/1901.03271

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání