ЭФФЕКТИВНАЯ ФРАГМЕНТИРОВАННАЯ РЕАЛИЗАЦИЯ КРАЕВОЙ ЗАДАЧИ ФИЛЬТРАЦИИ ДВУХФАЗНОЙ ЖИДКОСТИ

Rok vydání: 2023
Předmět:
DOI: 10.24412/2073-0667-2023-2-45-73
Popis: Автоматизация конструирования параллельных программ численного моделирования является актуальной темой в области системного параллельного программирования. В общей постановке задача автоматического конструирования эффективной (по времени выполнения, расходу памяти, нагрузке на сеть и т. п.) параллельной программы по ее высокоуровневой спецификации является алгоритмически труднорешаемой. Развитие языков и систем автоматического конструирования параллельных программ осуществляется за счет накопления в системах частных решений и эвристик, обеспечивающих приемлемую эффективность конструируемых программ для классов приложений. Важную роль в этой связи имеет исследование эффективных параллельных реализаций конкретных задач численного моделирования на предмет возможности создания на основе этого опыта новых методов и алгоритмов конструирования эффективных параллельных программ для аналогичных случаев. Технология фрагментированного программирования является подходом, позволяющим автоматизировать конструирование эффективных параллельных программ численного моделирования. Система LuNA, разрабатываемая в ИВМиМГ СО РАН, инструментально поддерживает этот подход. В статье рассматривается эффективная фрагментированная реализация на мультикомпьютерах решателя краевой задачи фильтрации двухфазной жидкости в трехмерной области в присутствии скважин. Разработаны и оптимизированы две версии программы — одна на основе традиционных средств параллельного программирования (MPI+OpenMP), вторая — полученная с помощью системы LuNA. Обе реализации основаны на анализе численного алгоритма с точки зрения возможностей его эффективной параллельной реализации. Экспериментальное исследование реализаций показало, что программа, разработанная вручную, обладает удовлетворительной эффективностью, а автоматически сконструированная программа с помощью системы LuNA уступает в производительности ручной реализации около трех раз, что является хорошим показателем для систем такого типа.
Programs construction automation is an approach which can potentially reduce complexity and laboriousness of development, debugging and modification of numerical parallel programs for multicomputers. In high performance computing it is important not just to construct a valid program, but also to make it efficient, which is a challenging problem with no satisfactory general solution. Thus various programming systems are only capable of providing high efficiency of constructed programs for a limited range of applications. To achieve this the systems employ various heuristics and particular effective solutions. Evolution of parallel program construction automation means consists in accumulating such heuristics and particular solutions in order to improve efficiency of constructed programs, as well as to widen the range of applications the system can handle effectively. It is important to investigate various particular manual implementations of numerical programs from the perspective of the possibilities of further automation of such construction. Fragmented programming technology is an approach for numerical parallel programs development and construction automation. The approach is based on the theory of parallel programs synthesis on the basis of computational models. The approach is partially supported by LuNA system, which is a system for numerical parallel programs construction automation for distributed memory systems (multicomputers). The paper is devoted to study of a particular application — a two phase fluid boundary value problem solver for a 3D case and presence of wells. The application is implemented as a fragmented program in two versions: the first one is based on conventional means (MPI and OpenMP), and the second one is using LuNA system. The basic idea behind fragmented programming is to consider a parallel program as an aggregate of sequential parts called computational fragments (CF). Each CF is implemented by a conventional sequential subroutine with no side effects. Input and output arguments of CFs are immutable pieces of data called data fragments (DFs). The execution process is considered as execution of a set of CFs in a data-flow manner, where each CF is ready for execution once all its input DFs are computed. CF’s execution produces a number of output DFs. If the program is represented as a set of CFs and DFs a system can be used to perform execution and provide dynamic properties of the execution, such as dynamic load balancing. LuNA system offers a domain specific language LuNA to describe the set of CFs and DFs as LuNAprogram. The system then translates the program into an intermediate representation, executable by the runtime subsystem. The runtime subsystem is basically a distributed virtual machine, which implements CFs execution in data-flow manner. Such an approach significantly simplifies the process of parallel program construction, since the programmer does not do parallel programming as such. He only describes the set of CFs and DFs, provides conventional sequential subroutines which implement CFs in C++, and that’s all. No programming of communications, synchronizations, memory management and other low-level details is required. However, the efficiency of execution of LuNA programs may be significantly lower, than that of manually developed program using conventional parallel programming means. That is caused by the fact that construction of an efficient parallel program from its high-level specification is algorithmically hard in general case. To help LuNA system to construct more efficient programs the programmer is provided with means to tune the construction process. The means are called recommendations and directives. Usage of the means can significantly increase the efficiency of the constructed program by supplying the system with the programmer’s insight on how he suggests to execute fragments. Such information includes hints on CFs and DFs distribution and redistribution to nodes, order of CFs execution, garbage collection directives, etc. In the paper an in-depth analysis of the considered application is provided to elaborate an efficient parallel implementation of the numerical algorithm in a multi-core distributed environment. Then an efficient conventional distributed program is developed and described in the paper. The program is developed using MPI and OpenMP. Then, a LuNA program is developed and optimized. The process of development and optimization of LuNA program is presented in the paper to allow reuse of the experience for future development of similar fragmented programs. Then the experimental study of the efficiency of the constructed programs is presented. The implementations were examined on a representative set of parameters for three different hardware environments, namely, Novosibirsk State University Computing Center and Joint Supercomputer Center of RAS with Ethernet and InfiniBand interconnects. The conventional distributed program has shown the speedup of 2.3.к on 6 nodes, which is a satisfactory result for the application class. The LuNA program has shown about 3x slowdown on up to 16 computing nodes as compared to MPI implementation, which is a good result for an automatic parallel programs construction system. To conclude, the research has resulted in development of an efficient MPI implementation of the application, based on an in-depth analysis of the numerical algorithm. Current version of LuNA system was tested for its ability to construct efficient parallel program in real life computations, and the tests showed, that LuNA is capable of it. All the implementations are described in the paper in details to allow other programmers to reuse the experience for implementation and optimization of other fragmented programs. The conducted research can also be used as the basis for development of system algorithms, capable of automatic optimization of efficiency of similar LuNA programs.
Databáze: OpenAIRE