Výsledky vyhledávání - "CUDA Pinned memory"

Network Interface Architecture for Remote Indirect Memory Access (RIMA) in Datacenters

Autor: Mithuna Thottethodi, T. N. Vijaykumar, Jiachen Xue

Publikováno v: ACM Transactions on Architecture and Code Optimization. 17:1-22

Remote Direct Memory Access (RDMA) fabrics such as InfiniBand and Converged Ethernet report latency shorter by a factor of 50 than TCP. As such, RDMA is a potential replacement for TCP in datacenters (DCs) running low-latency applications, such as We

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::39b4177306858219390417015fa07d6e
https://doi.org/10.1145/3374215

Zobrazit plný text záznamu

OpenCLIPER: An OpenCL-Based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Autor: Manuel Rodríguez-Cayetano, Javier Royuela-del-Val, Elena Martín-González, Federico Simmross-Wattenberg, Marcos Martín-Fernández, Elisa Moya-Sáez, Carlos Alberola-López

Publikováno v: IEEE Journal of Biomedical and Health Informatics. 23:1702-1709

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in terms of ho

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::55bbfe3b0b2567fc3b47abf3ba9f1882
https://doi.org/10.1109/jbhi.2018.2869421

Zobrazit plný text záznamu

Parallelization and Optimization of a Combustion Simulation Application on GPU Platform

Autor: Yonggang Che, Zhuoqian Li

Publikováno v: Proceedings of the 2020 4th International Conference on High Performance Compilation, Computing and Communications.

TURF sim (Target Unsteady Reacting Flow simulation) is a CFD application that solves engine combustion problems on structured grids. In this paper, PGI CUDA Fortran is used to implement the CPU + GPU heterogeneous parallelization. To reduce the data

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::8f0983ebf321ea6795b4db6ddf229ce2
https://doi.org/10.1145/3407947.3407960

Zobrazit plný text záznamu

Benchmarking the GPU memory at the warp level

Autor: Haifang Zhou, Weimin Zhang, Jianxing Liao, Jianbin Fang, Minquan Fang, Yuangang Wang

Publikováno v: Parallel Computing. 71:23-41

Graphic process units (GPUs) are widely used in scientific computing, because of their high performance and energy efficiency. Nonetheless, GPUs are featured with a hierarchical memory system, on which code optimization requires an in-depth understan

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::eb56972c53fdaa35d8f6b4d726776444
https://doi.org/10.1016/j.parco.2017.11.003

Zobrazit plný text záznamu

HMC-Sim-2.0: A co-design infrastructure for exploring custom memory cube operations

Autor: John D. Leidel, Yong Chen

Publikováno v: Parallel Computing. 68:77-88

The recent advent of stacked memory devices has led to a resurgence of research associated with the fundamental memory hierarchy and associated memory pipeline. The bandwidth advantages provided by stacked logic and DRAM devices have inspired researc

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3524c80b3082de367d53ee7e011be268
https://doi.org/10.1016/j.parco.2017.07.008

Zobrazit plný text záznamu

A multi-agent model for general-purpose computing on graphics processing units

Autor: Hassan Ouajji, Omar Bouattane, Mohamed Youssfi, Hicham Fakhi

Publikováno v: Multiagent and Grid Systems. 13:237-252

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::0df634277a93f4abc461c6de1f1823eb
https://doi.org/10.3233/mgs-170269

Zobrazit plný text záznamu

An extended analysis of memory hierarchies for efficient implementations of image processing applications

Autor: Christian Hartmann, Dietmar Fey

Publikováno v: Journal of Real-Time Image Processing. 14:713-728

Through continued miniaturization of electronic devices embedded smart cameras are steadily becoming more and more important. The reduction of the camera size increases the spectrum of applications. In industrial applications the range of smart camer

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::2cac544c875496140085a3e63dc39274
https://doi.org/10.1007/s11554-017-0723-2

Zobrazit plný text záznamu

Communication and Load Balancing Optimization for Finite Element Electromagnetic Simulations Using Multi-GPU Workstation

Autor: Adam Dziekonski, Adam Lamecki, Michal Mrozowski, P. Sypek

Publikováno v: IEEE Transactions on Microwave Theory and Techniques. 65:2661-2671

This paper considers a method for accelerating finite-element simulations of electromagnetic problems on a workstation using graphics processing units (GPUs). The focus is on finite-element formulations using higher order elements and tetrahedral mes

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::81984a4204c61511d49a2957acc6a88a
https://doi.org/10.1109/tmtt.2017.2714670

Zobrazit plný text záznamu

Performance Evaluation of a Two-Dimensional Lattice Boltzmann Solver Using CUDA and PGAS UPC Based Parallelisation

Autor: Irene Moulitsas, Tamás I. Józsa, Ádám Koleszár, Máté Szőke, László Könözsy

Publikováno v: ACM Transactions on Mathematical Software, 44(1):8. Association for Computing Machinery (ACM)
Szoke, M, Józsa, T, Koleszár, Á, Moulitsas, I & Könözsy, L 2017, ' Performance evaluation of a two-dimensional lattice Boltzmann solver using CUDA and PGAS UPC based parallelisation ', ACM Transactions on Mathematical Software, vol. 44, no. 1, 8 . https://doi.org/10.1145/3085590

The Unified Parallel C (UPC) language from the Partitioned Global Address Space (PGAS) family unifies the advantages of shared and local memory spaces and offers a relatively straightforward code parallelisation with the Central Processing Unit (CPU)

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2d75d380666429abb4272f82f2c1e47b
https://doi.org/10.1145/3085590

Zobrazit plný text záznamu

Lygiagretūs skaičiavimai su CUDA

Autor: Dmitrij Šešok, Julija Semenenko

Publikováno v: Jaunųjų mokslininkų darbai. 47:87-93

Straipsnyje pateikiami NVIDIA CUDA skaičiavimų technologijos veikimo principai, darbo su CUDA ypatumai. Su „GeForce“ ir „Quadro“ grafinėmis plokštėmis bei CPU atlikti du skaitiniai eksperimentai – masyvų sudėtis ir matricų sandauga,

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::94d9c338a8d97f607b8918322f423899
https://doi.org/10.21277/jmd.v47i1.135

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání