Popis: |
In smart girds, supervisory control and data acquisition (SCADA) systems have to protect data from advanced persistent threats (APTs), which exploit vulnerabilities of the power infrastructures to launch stealthy and targeted attacks. In this paper, we propose a reinforcement learning-based APT defense scheme for the control center to choose the detection interval and the number of Central Processing Units (CPUs) allocated to the data concentrators based on the data priority, the size of the collected meter data, the history detection delay, the previous number of allocated CPUs, and the size of the labeled compromised meter data without the knowledge of the attack interval and attack CPU allocation model. The proposed scheme combines deep learning and policy-gradient based actor-critic algorithm to accelerate the optimization speed at the control center, where an actor network uses the softmax distribution to choose the APT defense policy and the critic network updates the actor network weights to improve the computational performance. The advantage function is applied to reduce the variance of the policy gradient. Simulation results show that our proposed scheme has a performance gain over the benchmarks in terms of the detection delay, data protection level, and utility. |