Výsledky vyhledávání - "Wittmann, Markus"

Report

Lattice Boltzmann Benchmark Kernels as a Testbed for Performance Analysis

Autor: Wittmann, Markus, Haag, Viktor, Zeiser, Thomas, Köstler, Harald, Wellein, Gerhard

Publikováno v: Computers & Fluids, 2018

Lattice Boltzmann methods (LBM) are an important part of current computational fluid dynamics (CFD). They allow easy implementations and boundary handling. However, competitive time to solution not only depends on the choice of a reasonable method, b

Externí odkaz: http://arxiv.org/abs/1711.11468

Zobrazit plný text záznamu

Report

Extreme Scale-out SuperMUC Phase 2 - lessons learned

Publikováno v: Advances in Parallel Computing, vol. 27: Parallel Computing: On the Road to Exascale, eds. G.R. Joubert et al., p. 827, 2016

In spring 2015, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, LRZ), installed their new Peta-Scale System SuperMUC Phase2. Selected users were invited for a 28 day extreme scale-out block operation during which they were allowed to use th

Externí odkaz: http://arxiv.org/abs/1609.01507

Zobrazit plný text záznamu

Report

A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes

Autor: Bauer, Simon, Mohr, Marcus, Rüde, Ulrich, Weismüller, Jens, Wittmann, Markus, Wohlmuth, Barbara

Matrix-free finite element implementations of massively parallel geometric multigrid save memory and are often significantly faster than implementations using classical sparse matrix techniques. They are especially well suited for hierarchical hybrid

Externí odkaz: http://arxiv.org/abs/1608.06473

Zobrazit plný text záznamu

Report

Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero

Autor: Wittmann, Markus, Zeiser, Thomas, Hager, Georg, Wellein, Gerhard

Simple floating point operations like addition or multiplication on normalized floating point values can be computed by current AMD and Intel processors in three to five cycles. This is different for denormalized numbers, which appear when an underfl

Externí odkaz: http://arxiv.org/abs/1506.03997

Zobrazit plný text záznamu

Report

Chip-level and multi-node analysis of energy-optimized lattice-Boltzmann CFD simulations

Autor: Wittmann, Markus, Hager, Georg, Zeiser, Thomas, Treibig, Jan, Wellein, Gerhard

Memory-bound algorithms show complex performance and energy consumption behavior on multicore processors. We choose the lattice-Boltzmann method (LBM) on an Intel Sandy Bridge cluster as a prototype scenario to investigate if and how single-chip perf

Externí odkaz: http://arxiv.org/abs/1304.7664

Zobrazit plný text záznamu

Report

Asynchronous MPI for the Masses

Autor: Wittmann, Markus, Hager, Georg, Zeiser, Thomas, Wellein, Gerhard

We present a simple library which equips MPI implementations with truly asynchronous non-blocking point-to-point operations, and which is independent of the underlying communication infrastructure. It utilizes the MPI profiling interface (PMPI) and t

Externí odkaz: http://arxiv.org/abs/1302.4280

Zobrazit plný text záznamu

Report

Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations

Autor: Wittmann, Markus, Zeiser, Thomas, Hager, Georg, Wellein, Gerhard

Publikováno v: Computers & Fluids, Volume 80, Pages 283-289 (2013)

We present a simple, parallel and distributed algorithm for setting up and partitioning a sparse representation of a regular discretized simulation domain. This method is scalable for a large number of processes even for complex geometries and ensure

Externí odkaz: http://arxiv.org/abs/1111.1129

Zobrazit plný text záznamu

Report

Comparison of different Propagation Steps for the Lattice Boltzmann Method

Autor: Wittmann, Markus, Zeiser, Thomas, Hager, Georg, Wellein, Gerhard

Publikováno v: Computers & Mathematics with Applications, Volume 65, Issue 6, Pages 924-935 (2013)

Several possibilities exist to implement the propagation step of the lattice Boltzmann method. This paper describes common implementations which are compared according to the number of memory transfer operations they require per lattice node update.

Externí odkaz: http://arxiv.org/abs/1111.0922

Zobrazit plný text záznamu

Report

Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems

Autor: Wittmann, Markus, Hager, Georg

Task parallelism as employed by the OpenMP task construct or some Intel Threading Building Blocks (TBB) components, although ideal for tackling irregular problems or typical producer/consumer schemes, bears some potential for performance bottlenecks

Externí odkaz: http://arxiv.org/abs/1101.0093

Zobrazit plný text záznamu

Report

Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters

Autor: Wittmann, Markus, Hager, Georg, Treibig, Jan, Wellein, Gerhard

Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes ex

Externí odkaz: http://arxiv.org/abs/1006.3148

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání