Popis: |
Many camera apps and online video conference solutions support instant selfie segmentation or virtual background function for entertainment, aesthetic, privacy, and security reasons. A good number of studies show that Deep-Learning based segmentation model (DSM) is a reasonable choice for selfie segmentation, and the ensemble of multiple DSMs can improve the precision of the segmentation result. However, it is not fit well when we apply these approaches directly to the image segmentation in a video. This paper proposes an N-Frames (NF) ensemble approach for a selfie segmentation in a video using an ensemble of multiple DSMs to achieve a high-performance automatic segmentation. Unlike the N-Models (NM) ensemble which executes multiple DSMs at once for every single video frame, the proposed NF ensemble executes only one DSM upon a current video frame and combines segmentation results of previous frames to produce the final result. For the experiment, we use four state-of-the-art image segmentation models to make an ensemble. We evaluated the proposed approach using 81 videos dataset with a single-person view collected from publicly available websites. To measure the performance of segmentation models, Intersection over Union (IoU), IoU standard deviation, false prediction rate, Memory Efficiency Rate and Computing power Efficiency Rate parameters were considered. The average IoU values of the Two-Models NM ensemble, Two-Frames NF ensemble, Three-Models NM ensemble and Three-Frames NF ensemble were 95.1868%, 95.1253%, 95.3667% and 95.1734% each, whereas the average IoU value of single models was 92.9653%. The result shows that the proposed NF ensemble approach improves the accuracy of selfie segmentation by more than 2% on average. The result of cost efficiency measurement shows that the proposed method consumes less computing power like single models. |