Zobrazeno 1 - 10
of 723
pro vyhledávání: '"CUDA Pinned memory"'
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 17:1-22
Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications, such as We
Autor:
Manuel Rodríguez-Cayetano, Javier Royuela-del-Val, Elena Martín-González, Federico Simmross-Wattenberg, Marcos Martín-Fernández, Elisa Moya-Sáez, Carlos Alberola-López
Publikováno v:
IEEE Journal of Biomedical and Health Informatics. 23:1702-1709
Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in terms of ho
Autor:
Yonggang Che, Zhuoqian Li
Publikováno v:
Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications.
TURF sim (Target Unsteady Reacting Flow simulation) is a CFD application that solves engine combustion problems on structured grids. In this paper, PGI CUDA Fortran is used to implement the CPU + GPU heterogeneous parallelization. To reduce the data
Publikováno v:
Parallel Computing. 71:23-41
Graphic process units (GPUs) are widely used in scientific computing, because of their high performance and energy efficiency. Nonetheless, GPUs are featured with a hierarchical memory system, on which code optimization requires an in-depth understan
Autor:
John D. Leidel, Yong Chen
Publikováno v:
Parallel Computing. 68:77-88
The recent advent of stacked memory devices has led to a resurgence of research associated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices have inspired researc
Publikováno v:
Multiagent and Grid Systems. 13:237-252
Autor:
Christian Hartmann, Dietmar Fey
Publikováno v:
Journal of Real-Time Image Processing. 14:713-728
Through continued miniaturization of electronic devices embedded smart cameras are steadily becoming more and more important. The reduction of the camera size increases the spectrum of applications. In industrial applications the range of smart camer
Publikováno v:
IEEE Transactions on Microwave Theory and Techniques. 65:2661-2671
This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral mes
Publikováno v:
ACM Transactions on Mathematical Software, 44(1):8. Association for Computing Machinery (ACM)
Szoke, M, Józsa, T, Koleszár, Á, Moulitsas, I & Könözsy, L 2017, ' Performance evaluation of a two-dimensional lattice Boltzmann solver using CUDA and PGAS UPC based parallelisation ', ACM Transactions on Mathematical Software, vol. 44, no. 1, 8 . https://doi.org/10.1145/3085590
Szoke, M, Józsa, T, Koleszár, Á, Moulitsas, I & Könözsy, L 2017, ' Performance evaluation of a two-dimensional lattice Boltzmann solver using CUDA and PGAS UPC based parallelisation ', ACM Transactions on Mathematical Software, vol. 44, no. 1, 8 . https://doi.org/10.1145/3085590
The Unified Parallel C (UPC) language from the Partitioned Global Address Space (PGAS) family unifies the advantages of shared and local memory spaces and offers a relatively straightforward code parallelisation with the Central Processing Unit (CPU)
Autor:
Dmitrij Šešok, Julija Semenenko
Publikováno v:
Jaunųjų mokslininkų darbai. 47:87-93
Straipsnyje pateikiami NVIDIA CUDA skaičiavimų technologijos veikimo principai, darbo su CUDA ypatumai. Su „GeForce“ ir „Quadro“ grafinėmis plokštėmis bei CPU atlikti du skaitiniai eksperimentai – masyvų sudėtis ir matricų sandauga,