Zobrazeno 1 - 10
of 87
pro vyhledávání: '"Milind Chabbi"'
Publikováno v:
Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures.
The lock is a building-block synchronization primitive that enables mutually exclusive access to shared data in shared-memory parallel programs. Mutual exclusion is typically achieved by guarding the code that accesses the shared data with a pair of
Publikováno v:
Concurrency and Computation: Practice and Experience.
We propose COMDETECTIVE+, an inter-thread communication analyzer, and REUSETRACKER+, a reuse distance analyzer, that leverage the hardware features in AMD processors to support low-overhead profiling. Both tools employ the instruction-based sampling
Precise event sampling is a profiling feature in commodity processors that can sample hardware events and accurately locate the instructions that trigger the events. This feature has been used in a large number of tools to detect application performa
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b362637f06bcf0fbfeaac4874eff8ac4
http://hdl.handle.net/10044/1/103627
http://hdl.handle.net/10044/1/103627
Publikováno v:
ACM Transactions on Architecture and Code Optimization. 19:1-25
One widely used metric that measures data locality is reuse distance —the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance
The concurrent programming literature is rich with tools and techniques for data race detection. Less, however, has been known about real-world, industry-scale deployment, experience, and insights about data races. Golang (Go for short) is a modern p
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3edeac1a8096ee11c34210d915141738
http://arxiv.org/abs/2204.00764
http://arxiv.org/abs/2204.00764
Publikováno v:
ACM Transactions on Parallel Computing. 7:1-32
The popularity of Non-Uniform Memory Access (NUMA) architectures has led to numerous locality-preserving hierarchical lock designs, such as HCLH, HMCS, and cohort locks. Locality-preserving locks trade fairness for higher throughput. Hence, some inst
Publikováno v:
CGO
Modern mobile application binaries are bulky for many reasons: software and its dependencies, fast-paced addition of new features, high-level language constructs, and statically linked platform libraries. Reduced application size is critical not only
Publikováno v:
SC
ARM is an attractive CPU architecture for exascale systems because of its energy efficiency. As a recent entry into the HPC paradigm, ARM lags in its software stack, especially in the performance tooling aspect. Notably, there is a lack of fine-grain
Publikováno v:
ICS
Compilers are an indispensable component in the software stack. Besides generating machine code, compilers perform multiple optimizations to improve code performance. Typically, scientific programmers treat compilers as a blackbox and expect them to
Autor:
Yanjie Wei, Jeff R. Hammond, Milind Chabbi, Huiwei Lu, Satoshi Matsuoka, Pavan Balaji, Abdelhalim Amer
Publikováno v:
ACM Transactions on Parallel Computing. 5:1-21
In this article, we investigate contention management in lock-based thread-safe MPI libraries. Specifically, we make two assumptions: (1) locks are the only form of synchronization when protecting communication paths; and (2) contention occurs, and t