Zobrazeno 1 - 10
of 117
pro vyhledávání: '"F. Dolz"'
Autor:
Manuel F. Dolz, Héctor Martínez, Adrián Castelló, Pedro Alonso-Jordá, Enrique S. Quintana-Ortí
Publikováno v:
The Journal of Supercomputing. 79:10589-10610
We take a step forward towards developing high-performance codes for the convolution operator, based on the Winograd algorithm, that are easy to customise for general-purpose processor architectures. In our approach, augmenting the portability of the
Publikováno v:
IEEE Access, Vol 6, Pp 39944-39961 (2018)
Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are cru
Externí odkaz:
https://doaj.org/article/77d97f163c6d4837b43a582c58d075e9
Publikováno v:
2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
Publikováno v:
Repositori Universitat Jaume I
Universitat Jaume I
Universitat Jaume I
For many distributed applications, data communication poses an important bottleneck from the points of view of performance and energy consumption. As more cores are integrated per node, in general the global performance of the system increases yet ev
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e3bb0adbad38b27ee9a92a955a44f570
Publikováno v:
Repositori Universitat Jaume I
Universitat Jaume I
Universitat Jaume I
Tuning and optimising the operations executed in deep learning frameworks is a fundamental task in accelerating the processing of deep neural networks (DNNs). However, this optimisation usually requires extensive manual efforts in order to obtain the
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b1de51e45b08ba63594e6733b224b0ba
http://hdl.handle.net/10234/198110
http://hdl.handle.net/10234/198110
Ponència presentada en el 2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) We take a step forward in the direction of developing high performance codes for the convolution, based on the Winogra
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5fdbf0d3b97afabd58f6760031cd3266
Publikováno v:
Repositori Universitat Jaume I
Universitat Jaume I
Universitat Jaume I
Convolutional Neural Networks (CNNs) play a crucial role in many image recognition and classification tasks, recommender systems, brain-computer interfaces, etc. As a consequence, there is a notable interest in developing high performance realization
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::51d88b4cd8ca4f5a9919b94eb89a2417
https://doi.org/10.1016/j.jpdc.2022.05.009
https://doi.org/10.1016/j.jpdc.2022.05.009
Publikováno v:
e-Archivo: Repositorio Institucional de la Universidad Carlos III de Madrid
Universidad Carlos III de Madrid (UC3M)
e-Archivo. Repositorio Institucional de la Universidad Carlos III de Madrid
instname
Parallel Computing
Universidad Carlos III de Madrid (UC3M)
e-Archivo. Repositorio Institucional de la Universidad Carlos III de Madrid
instname
Parallel Computing
In recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and shared-memory platforms. To cont
Publikováno v:
IPDPS Workshops
We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of computers that has been designed as a research-oriented tool with a low learning curve. Our parallel training framework offers a set of functionalities that cover
Autor:
Mar Catalan, Enrique S. Quintana-Ortí, Jose I. Mestre, Adrián Castelló, José Duato, Manuel F. Dolz
Publikováno v:
PDP
Training deep neural networks is a costly procedure, often performed via sophisticated deep learning frameworks on clusters of computers. As faster processor technologies are integrated into these cluster facilities (e.g., NVIDIA’s graphics acceler