Abstrakt: |
The paper proposes an approach to solving the problem of optimal management of a technical system or process (control object) in the form of determining a set of values of control parameters on a set time horizon (control horizon) optimizing the target function of the control object on a given horizon. To solve the problem, a method of identifying the model of the control object using an artificial neural network has been developed. For each specified time slice of the required control horizon, the model calculates the predicted values of the object state parameter on the basis of the values of the object control parameter generated for each time slice of the tuples. For each time slice, the tuples of the target function of the object are calculated according to the generated tuples of the control parameters and the forecast calculated by the model on their basis. The solution to the problem of optimal control is the selection in the process of optimization of one control parameter value from the generated tuples for each time slice providing an optimum (maximum or minimum) sum of the values of the target function for all specified time slices of the required control horizon. Optimization of the target function of the object is carried out using a modified Thompson sampling algorithm used in the well-known multiarmed bandit problem. As an example, the problem of building a set of optimal prices for goods, the implementation of which is carried out in the free market, maximizing the profit of the seller for a given management horizon, is solved. [ABSTRACT FROM AUTHOR] |