Popis: |
Rationale: Covid-19 is certainly one of the worst pandemics ever. In the absence of a vaccine, classical epidemiological measures such as testing in order to isolate the infected people, quarantine and social distancing are ways to reduce the growing speed of new infections as much as possible and as soon as possible, but with a cost to economic and social disruption. It is therefore a challenge to implement timely and appropriate public health interventions. Objective: This study investigates a reinforcement learning-based approach to incrementally learn how much intensity of each public health intervention should be applied at each period in a given region. Methods: First we define the basic components of a reinforcement learning (RL) setup (i.e., states, reward, actions, and transition function), this represents the learning environment for the agent (i.e., an AI-Model). Then we train our agent using RL in an online fashion, using a reinforcement learning algorithm known as REINFORCE. Finally, a developed flow network, serving as an epidemiological model is used to visualize the results of the decisions taken by the agent given different epidemic and demographic state scenarios. Main Results: After a relatively short period of training, the agent starts taking reasonable actions allowing a balance between the public health and economic considerations. In order to test the developed tool, we ran the RL-agent on different regions (demographic scale) and recorded the output policy which was still consistent with the training performance. The flow network used to visualize the results of the simulation is considerably useful since it shows a high correlation between the simulated results and the real case scenarios. Conclusion: This work shows that the reinforcement learning paradigm can be used to learn public health policies in complex epidemiological models. Moreover, through this experiment, we demonstrate that the developed model can be very useful if fed in with real data. Future Work: When treating trade-off problems (balance between two goals) like here, engineering a good reward (that encapsulates all goals) can be difficult, therefore future work might tackle this problem by investigating other techniques such as inverse reinforcement learning and Human-in-the-Loop. Also, regarding the developed epidemiological model, we aim to gather proper real data that can be used to make the training environment more realistic, as well as to apply it for a network of regions instead of a single region |