Popis: |
We describe a parallel implementation of the tensor-product solver derived in Kvarving et al. (2010) [19] and Bjontegaard et al. (2009) [6] . A combined distributed/shared memory model is chosen, since the flexibility allows us to map the algorithm better to the available resources. Since the approach requires special attention to load balancing, we also propose a scheme that resolves the challenges involved. Speedup results from test problems, as well as from real simulations, are presented and discussed. While the speedups are not perfect, we show that the new algorithms are more than competitive with a standard 3D approach parallelized using domain decomposition. |