Zobrazeno 1 - 10
of 127
pro vyhledávání: '"Edward S. Davidson"'
Publikováno v:
IEEE Transactions on Computers. 53:126-140
The growing difference between processor and main memory cycle time demands the use of aggressive prefetch algorithms to reduce the effective memory access latency. However, prefetching can significantly increase memory traffic and unsuccessful prefe
Publikováno v:
HPCA
With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing
Publikováno v:
International Conference on Supercomputing
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations fo
Publikováno v:
ICS 25th Anniversary
Publikováno v:
ISCA
Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular a
Publikováno v:
IEEE Transactions on Computers. 50:769-783
In this paper, we examine the effectiveness of a new hardware mechanism, called register queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to
Publikováno v:
IEEE Transactions on Computers. 48:1244-1259
As microprocessor speeds continue to outpace memory subsystems in speed, minimizing average data access time grows in importance. Multilateral caches afford an opportunity to reduce the average data access time by active management of block allocatio
Autor:
Gheith A. Abandah, Edward S. Davidson
Publikováno v:
IEEE Transactions on Parallel and Distributed Systems. 9:206-216
In a distributed shared memory (DSM) multiprocessor, the processors cooperate in solving a parallel application by accessing the shared memory. The latency of a memory access depends on several factors, including the distance to the nearest valid dat
Publikováno v:
PLDI
Modulo scheduling algorithms based on optimal solvers have been proposed to investigate and tune the performance of modulo scheduling heuristics. While recent advances have broadened the scope for which the optimal approach is applicable, this approa
Autor:
Jude A. Rivers, Edward S. Davidson
Publikováno v:
Performance Evaluation. :189-207
This work evaluates the performance effectiveness of combining two techniques for improving cache hit rate and reducing memory traffic in small on-chip direct-mapped caches. Temporality-based caching is an efficient technique for reducing unnecessary