Popis: |
Simulations of electrical activity of networks of morphologically detailed neuron models allow for a better understanding of the brain. State-of-the-art simulations describe the dynamics of ionic currents and biochemical processes within branching topological representations of the neurons. Acceleration of such simulation is possible in the weak scaling limit by modeling neurons as indivisible computation units and increasing the computing power. Strong scaling and simulations close to biological time are difficult, yet required, for the study of synaptic plasticity and other use cases requiring simulation of neurons for long periods of time. Current methods rely on parallel Gaussian Elimination, computing triangulation and substitution of many branches simultaneously. Existing limitations are: (a) high heterogeneity of compute time per neuron leads to high computational load imbalance; and (b) difficulty in providing a computation model that fully utilizes the computing resources on distributed multi-core architectures with Single Instruction Multiple Data (SIMD) capabilities. To address these issues, we present a strategy that extracts flow-dependencies between parameters of the ODEs and the algebraic solver of individual neurons. Based on the resulting map of dependencies, we provide three techniques for memory, communication, and computation reorganization that yield a load-balanced distributed asynchronous execution. The new computation model distributes datasets and balances computational workload across a distributed memory space, exposing a tree-based parallelism of neuron topological structure, an embarrassingly parallel execution model of neuron subtrees, and a SIMD acceleration of subtree state updates. The capabilities of our methods are demonstrated on a prototype implementation developed on the core compute kernel of the NEURON scientific application, built on the HPX runtime system for the ParalleX execution model. Our implementation yields an asynchronous distributed and parallel simulation that accelerates single neuron to medium-sized neural networks. Benchmark results display better strong scaling properties, finer-grained parallelism, and lower time to solution compared to the state of the art, on a wide range of distributed multi-core compute architectures. |