Zobrazeno 1 - 10
of 142
pro vyhledávání: '"Loh, Gabriel H."'
When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores. Unfortuna
Externí odkaz:
http://arxiv.org/abs/1804.11043
Autor:
Ausavarungnirun, Rachata, Ghose, Saugata, Kayıran, Onur, Loh, Gabriel H., Das, Chita R., Kandemir, Mahmut T., Mutlu, Onur
In a modern GPU architecture, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining requests incu
Externí odkaz:
http://arxiv.org/abs/1804.11038
Autor:
Kim, Hyojong, Hadidi, Ramyad, Nai, Lifeng, Kim, Hyesoon, Jayasena, Nuwan, Eckert, Yasuko, Kayiran, Onur, Loh, Gabriel H.
Publikováno v:
ACM Transactions on Architecture and Code Optimization (TACO) Volume 15 Issue 3, October 2018 Article No. 32
Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory modules introduc
Externí odkaz:
http://arxiv.org/abs/1710.09517
Autor:
Ausavarungnirun, Rachata, Fallin, Chris, Yu, Xiangyao, Chang, Kevin Kai-Wei, Nazario, Greg, Das, Reetuparna, Loh, Gabriel H., Mutlu, Onur
Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by intr
Externí odkaz:
http://arxiv.org/abs/1602.06005
Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism
Autor:
Chang, Kevin K., Loh, Gabriel H., Thottethodi, Mithuna, Eckert, Yasuko, O'Connor, Mike, Manne, Srilatha, Hsu, Lisa, Subramanian, Lavanya, Mutlu, Onur
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unu
Externí odkaz:
http://arxiv.org/abs/1602.00722
Autor:
Ausavarungnirun, Rachata, Fallin, Chris, Yu, Xiangyao, Chang, Kevin Kai-Wei, Nazario, Greg, Das, Reetuparna, Loh, Gabriel H., Mutlu, Onur
Publikováno v:
In Parallel Computing May 2016 54:29-45
Publikováno v:
DAC: Annual ACM/IEEE Design Automation Conference; 2019, Issue 56, p1243-1246, 4p
As systems provide increasing memory capacities to support memory-intensive workloads, Translation Lookaside Buffers (TLBs) are becoming a critical performance bottleneck. TLB performance is exacerbated with virtualization, which is typically impleme
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::a7e70f98bcadba0365aa198bdcff1a19
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Publikováno v:
2016 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS); 2016, p161-171, 11p