Popis: |
Robbery is an open social problem. Towards tackling this problem, we in this paper propose multi-stream deep networks for the classification as well as temporal localization of robbery events in CCTV videos. In our multi-stream architecture, each stream is comprised of a pre-trained 3D ConvNet in combination with LSTM which is followed by softmax. In particular, we investigate three streams based on three different types of input: (a) RGB data, (b) optical flows, and (c) foreground masks. Each stream is trained independently, and the final scores are averaged for predictions.To test the approach, we compile a robbery dataset from YouTube, which contains 124 untrimmed CCTV videos. Empirical comparison with several state-of-the-art methods demonstrate the promise of our multi-stream model in both the classification as well as temporal localization tasks. |