Výsledky vyhledávání - "Christopher I. Rodrigues"

CubeGen: Code Generation for Accelerated GEMM-Based Convolution with Tiling

Autor: Amarin Phaosawasdi, Long Chen, Peng Wu, Christopher I. Rodrigues

Publikováno v: Languages and Compilers for Parallel Computing ISBN: 9783030727888
LCPC

In a convolutional neural network (CNN), the convolution layers typically dominate the execution time. Hardware accelerators have been designed to speed up convolution. One class of accelerators provide hardware support for matrix multiplication (mat

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::19c05ed8390f238b8335e5c74a6c31fa
https://doi.org/10.1007/978-3-030-72789-5_11

Zobrazit plný text záznamu

SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors

Autor: Amarin Phaosawasdi, Peng Wu, Christopher I. Rodrigues

Publikováno v: WPMVP@PPoPP

Developers often rely on automatic vectorization to speed up fine-grained data-parallel code. However, for loop nests where the loops are shorter than the processor's SIMD width, automatic vectorization performs poorly. Vectorizers attempt to vectori

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::29a79fbf717fffe64e46b5b70355671d
https://doi.org/10.1145/3178433.3178436

Zobrazit plný text záznamu

Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems

Autor: Wen-mei W. Hwu, Nasser Anssari, Geng (Daniel) Liu, John A. Stratton, Nady Obeid, Christopher I. Rodrigues, Li-Wen Chang, I-Jui Sung

Publikováno v: Computer. 45:26-32

A study of the implementation patterns among massively threaded applications for many-core GPUs reveals that each of the seven most commonly used algorithm and data optimization techniques can enhance the performance of applicable kernels by 2 to 10�

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::8e9f92e699d148ffa600c8ab1d207724
https://doi.org/10.1109/mc.2012.194

Zobrazit plný text záznamu

Scalable SIMD-parallel memory allocation for many-core machines

Autor: Christopher I. Rodrigues, Wen-mei W. Hwu, Xiaohuang Huang, Stephen Jones, Ian Buck

Publikováno v: The Journal of Supercomputing. 64:1008-1020

Dynamic memory allocation is an important feature of modern programming systems. However, the cost of memory allocation in massively parallel execution environments such as CUDA has been too high for many types of kernels. This paper presents XMalloc

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::40069748fc1f92237b361af5d3896173
https://doi.org/10.1007/s11227-011-0680-7

Zobrazit plný text záznamu

The parallelization of video processing

Autor: Xiaohuang Huang, Dennis Lin, Sanjay J. Patel, J. Blackburn, Minh N. Do, Quang Nguyen, Christopher I. Rodrigues, Wen-mei W. Hwu, Thomas S. Huang

Publikováno v: IEEE Signal Processing Magazine. 26:103-112

In this article, we focus on the applicability of parallel computing architectures to video processing applications. We demonstrate different optimization strategies in detail using the 3-D convolution problem as an example, and show how they affect

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::6ef37b9bc2c10ef2f395f2c83b9aa26b
https://doi.org/10.1109/msp.2009.934116

Zobrazit plný text záznamu

Compute Unified Device Architecture Application Suitability

Autor: Wen-mei W. Hwu, John A. Stratton, Shane Ryoo, Christopher I. Rodrigues

Publikováno v: Computing in Science & Engineering. 11:16-26

Graphics processing units (GPUs) can provide excellent speedups on some, but not all, general-purpose workloads. Using a set of computational GPU kernels as examples, the authors show how to adapt kernels to utilize the architectural features of a Ge

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::da96bbba3dd593e3febb9abe4d6f4ef8
https://doi.org/10.1109/mcse.2009.48

Zobrazit plný text záznamu

Program optimization carving for GPU computing

Autor: Shane Ryoo, Sain-Zee Ueng, Wen-mei W. Hwu, Sara S. Baghsorkhi, Christopher I. Rodrigues, John A. Stratton, Sam S. Stone

Publikováno v: Journal of Parallel and Distributed Computing. 68:1389-1401

Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a lo

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::ef2bc6345cb95feb7492c0ae3cd4a0eb
https://doi.org/10.1016/j.jpdc.2008.05.011

Zobrazit plný text záznamu

Adaptive Cache Management for Energy-Efficient GPU Computing

Autor: Jie Lv, Christopher I. Rodrigues, Zhiying Wang, Wen-mei W. Hwu, Xuhao Chen, Li-Wen Chang

Publikováno v: MICRO

With the SIMT execution model, GPUs can hidememory latency through massive multithreading for many applications that have regular memory access patterns. To support applications with irregular memory access patterns, cache hierarchies have been intro

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::e1ffabbc53dd8434b33f9996b46ae972
https://doi.org/10.1109/micro.2014.11

Zobrazit plný text záznamu

Triolet

Autor: Christopher I. Rodrigues, Wen-mei W. Hwu, Thomas B. Jablin, Abdul Dakkak

Publikováno v: PPOPP

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3aefc9bdd3538882c8ebe295c4e9dee8
https://doi.org/10.1145/2555243.2555268

Zobrazit plný text záznamu

Optimization and architecture effects on GPU computing workload performance

Autor: Nady Obeid, Nasser Anssari, Li-Wen Chang, John A. Stratton, I-Jui Sung, Christopher I. Rodrigues, Geng Daniel Liu, Wen-mei W. Hwu

Publikováno v: 2012 Innovative Parallel Computing (InPar).

It is unquestionable that successive hardware generations have significantly improved GPU computing workload performance over the last several years. Moore's law and DRAM scaling have respectively increased single-chip peak instruction throughput by

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3498a3932a0f89a63bc783cfa4124477
https://doi.org/10.1109/inpar.2012.6339605

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání