Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Cedric Nugteren"'
Autor:
Grigori Fursin, Anton Lokhmotov, Bruno Carpentieri, Fabiana Zollo, Marco Cianfriglia, Damiano Perri, Osvaldo Gervasi, Paolo Sylos Labini, Cedric Nugteren, Flavio Vella
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 18:1-24
Efficient HPC libraries often expose multiple tunable parameters, algorithmic implementations, or a combination of them, to provide optimized routines. The optimal parameters and algorithmic choices may depend on input properties such as the shapes o
Publikováno v:
Lecture Notes in Computer Science ISBN: 9783030110086
ECCV Workshops (1)
ECCV Workshops (1)
Convolutional neural networks have been successfully applied to semantic segmentation problems. However, there are many problems that are inherently not pixel-wise classification problems but are nevertheless frequently formulated as semantic segment
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::f9cd66531516756edb8ebb740277b4e2
https://doi.org/10.1007/978-3-030-11009-3_15
https://doi.org/10.1007/978-3-030-11009-3_15
Autor:
Valeriu Codreanu, Cedric Nugteren
This work presents CLTune, an auto-tuner for OpenCL kernels. It evaluates and tunes kernel performance of a generic, user-defined search space of possible parameter-value combinations. Example parameters include the OpenCL workgroup size, vector data
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fe79b10603275e0cef9ea2c3c3e0ae71
http://arxiv.org/abs/1703.06503
http://arxiv.org/abs/1703.06503
Autor:
Cedric Nugteren
Publikováno v:
IWOCL
This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-mu
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6903e9c701c28a306722862f72b17595
Publikováno v:
ACM Transactions on Architecture and Code Optimization, 9(4):40, 1-25. Association for Computing Machinery, Inc
ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization
Code generation and programming have become ever more challenging over the last decade due to the shift towards parallel processing. Emerging processor architectures such as multi-cores and GPUs exploit increasingly parallelism, requiring programmers
Publikováno v:
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015.
Autor:
Cedric Nugteren, Henk Corporaal
Publikováno v:
ACM Transactions on Architecture and Code Optimization, 11(4):35. Association for Computing Machinery, Inc
The shift toward parallel processor architectures has made programming and code generation increasingly challenging. To address this programmability challenge, this article presents a technique to fully automatically generate efficient and readable c
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::49430cf88ee6f7593a977ef265d66a48
https://research.tue.nl/nl/publications/814d3f4d-1ac2-408c-827a-6b1bf4d567e2
https://research.tue.nl/nl/publications/814d3f4d-1ac2-408c-827a-6b1bf4d567e2
Publikováno v:
20th IEEE Int. Symp. on High Performance Computer Architecture (HPCA-2014)
HPCA
Vrije Universiteit Amsterdam
Nugteren, C, van den Braak, G-J, Corporaal, H & Bal, H E 2014, A Detailed GPU Cache Model Based on Reuse Distance Theory . in 20th IEEE Int. Symp. on High Performance Computer Architecture (HPCA-2014) . IEEE CS .
Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 15-19 February 2014, Orlando, Florida, 37-48
STARTPAGE=37;ENDPAGE=48;TITLE=Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 15-19 February 2014, Orlando, Florida
HPCA
Vrije Universiteit Amsterdam
Nugteren, C, van den Braak, G-J, Corporaal, H & Bal, H E 2014, A Detailed GPU Cache Model Based on Reuse Distance Theory . in 20th IEEE Int. Symp. on High Performance Computer Architecture (HPCA-2014) . IEEE CS .
Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 15-19 February 2014, Orlando, Florida, 37-48
STARTPAGE=37;ENDPAGE=48;TITLE=Proceedings of the IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 15-19 February 2014, Orlando, Florida
As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimising cache locality system-atically requires insight
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a8eb6190a206b467b499de9d7780695b
https://research.tue.nl/en/publications/66653e1d-2ce7-44ab-8543-744cd8fa4d3b
https://research.tue.nl/en/publications/66653e1d-2ce7-44ab-8543-744cd8fa4d3b
Publikováno v:
Proceedings of MuCoCoS-6: Internation Workshop on Multi-/Many-core Computing Systems, 7 September 2013, Edinburgh, Scotland, UK, 1-8
STARTPAGE=1;ENDPAGE=8;TITLE=Proceedings of MuCoCoS-6: Internation Workshop on Multi-/Many-core Computing Systems, 7 September 2013, Edinburgh, Scotland, UK
STARTPAGE=1;ENDPAGE=8;TITLE=Proceedings of MuCoCoS-6: Internation Workshop on Multi-/Many-core Computing Systems, 7 September 2013, Edinburgh, Scotland, UK
The shift towards parallel processor architectures has made programming, performance prediction and code generation increasingly challenging. Abstract representations of program code (i.e. classifications) have been introduced to address this challen
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a3276559efe099cb49be5619df99e64f
https://doi.org/10.1109/MuCoCoS.2013.6633604
https://doi.org/10.1109/MuCoCoS.2013.6633604
Publikováno v:
Advanced parallel processing technologies : 10th international symposium, APPT 2013, Stockholm, Sweden, August 27-28, 2013 : revised selected papers, 184-198
STARTPAGE=184;ENDPAGE=198;TITLE=Advanced parallel processing technologies : 10th international symposium, APPT 2013, Stockholm, Sweden, August 27-28, 2013 : revised selected papers
Lecture Notes in Computer Science ISBN: 9783642452925
APPT
STARTPAGE=184;ENDPAGE=198;TITLE=Advanced parallel processing technologies : 10th international symposium, APPT 2013, Stockholm, Sweden, August 27-28, 2013 : revised selected papers
Lecture Notes in Computer Science ISBN: 9783642452925
APPT
This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and 'algorithmic species', an algorithm classification of program code. We use a t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6415a103b0c9372317e7db2bc725e5cd
https://research.tue.nl/nl/publications/2d7b1048-8f2b-4726-b169-4190e2d5b654
https://research.tue.nl/nl/publications/2d7b1048-8f2b-4726-b169-4190e2d5b654